Developer toolbox
Innovative tips, tricks, tools to improve your workflow.

Testing serverless Java API...

Where are the bugs? If you're are a developer, you've already written bugs. Not just one or two bugs, you have written many of them. A bug is just an error, and hey, we're only human! But if you don't find a better approach, this will still stay the case, and things won't improve. After some time, say a few months or a few years, you'll start to realise your code is hard to maintain and improve. To avoid this you need to add tests to make sure that your code produces the expected result, this process spans from the first day and until the last day of a project. The most specific kind of test that we can perform is a unit test. These offer the possibility to test a very narrow part of your code. This is the first big step in finding a bug. Make sure you're writing these tests beforehand (not rushing through the process) and taking the time to think about all acceptance criteria and complex cases of your business logic. At PALO IT this is our standard process, otherwise known as test-driven development (TDD)—we don’t write a single line of business code if there’s not already a test to cover it. Revisiting our framework documentation If you write unit tests and respect the TDD process you should produce good, quality code. But does this suffice? Let’s imagine that you want to create a REST API that returns a JSON list of 50 bookings. To make it happen, you'll need to support the HTTP GET method and return a JSON. If you're a Java developer, you'll probably choose Spring because it offers you the web support that you need, and Jackson to serialise your list of objects into JSON. So, you write unit tests for the expected list, create a service to build that list, expose this service in a controller, and run your tests: all green 🚦 But a problem can appear when you want to try this API for the first time. For example, if you are familiar with the Java encapsulation principle, you have probably (and by reflex) set all your fields as private. If that's the case, a call to your service will fail. In fact, by default, Jackson will not serialise all your private fields, so your API will not be able to return anything. To solve the issue you need to configure Jackson or change your class. In modern Java programming, several frameworks are often piled up: Spring, Aspect J, JPA, Jackson, Lombok, and so on. These frameworks can sometimes be incompatible with each other. This example echoes something very important—you should include all the frameworks and libraries that you use in your test strategy. In other words, your tests should go through all layers of the application to validate that the product works, not only the business code. You should include all frameworks and libraries you use in your test strategy. With this in mind, you'll be able to test the parsing/marshalling, the security, the network, and the integration of all our smaller parts of code where the framework acts like a glue between them. This is the second level of test, and you're probably familiar with it too, the integration test. Those tests are not well maintained by some that consider them too slow to run and maintain compared to unit tests, but they're the only way to test what you are exposing to other teams or partners. It’s also important to see that it’s not an easy job to know if we have a good integration test, because it’s hard to measure the coverage of your code from a client perspective. But, we always aim for top quality, so we need to reach this level of test and still follow the TDD process. The “it runs on my laptop” effect Now that you're writing unit and integration tests, and still following the TDD process, you should produce a quality project. But, does it meet our client's expectations? If we're creating an API, we need to deploy it to consider the job done, and this API is probably not the only one that you'll build and should run somewhere. It can be on a container that you deploy on a Kubernetes, exposed through a Kong gateway, secured by Keycloak, with data from an Oracle database. You might also be integrated into an existing CRM and a lot of other external dependencies relative to the business of your client. You need to do that within different contexts, because you probably have more than just one environment. These are complex but real-world obstacles. A serverless architecture, like the one that we are using today when we deploy on AWS Serverless, also has a lot of complexity because you still have a gateway and external dependencies. We want to be able to control risk by detecting bugs on the developer's computer, not after deployment or in continuous integration. To address this goal, we are using a specific framework, designed for this specific purpose and maintained by us. You can find this open source project on here. This framework allows us to emulate an AWS API gateway, the lambda runtime environment, and provide enough integration to measure the code and branch coverage of integration tests. In effect, we merge the concept of unit and integration tests — because we write micro services, the code is so simple that the granularity of unit tests and integration tests is identical. Time for a real-world example! Imagine a method that allows us to find a user by using his or her phone number. The code is available here. Here we check the JWT token and, regarding the role, decide to throw an error to the expected user. If the user is not present, we return a 404 with a message to explain that the user is not found. Here are two examples of tests that we have for this endpoint. As you can see, we are testing code from a real API consumer, we call the endpoint with RestAssured and check the response that we receive. A success in the first case and an error with a description of what happened in the second case, and we're also able to measure the coverage of that. Here's a part of the Jacoco code coverage report: The creation of the user is not part of this test. In fact, we are able to run DynamoDB locally and inject the data required by the tests. If we need to run other dependencies, we can mock them if it’s sufficient. Closing thoughts This approach is fast, complete, and offers a distinct advantage—our integration tests cover what was previously done by unit test, so we don't need to maintain them anymore, and can stay focused on the best level of test. This strategy helps us to find 95% of issues locally. Concretely: We write our API tests using a unit testing framework (Junit) In a test, we use an HTTP client library like RestAssured to actually invoke our API gateway and lambda based API, so we traverse all applicative layers, including framework and database We usually run a real database locally on the developer’s workstation, and simulators for complex external systems like 3rd party APIs Our framework allows us to run exactly the same tests locally (on a workstation) and remotely (after deployment on AWS). The remote testing helps capturing the remaining 5% of problems linked to configuration, scalability, security policies, throttling and other niceties, and is an integral part of our DevOps strategy Finally, we validate our TDD practice by continuously measuring the code and branch coverage of our business code. If we apply TDD properly, it should be 100% all the time. In our CI setting, we even fail the build if we don’t hit 100% (of course, we also parse the code base to check that lazy developers don’t use magical annotations like @lombok.Generated to disable code coverage 😁) The principles we exposed here work for a serverless architecture, but they can be applied as well for a more traditional, Spring Boot based approach, and of course also for other technologies.

Business Driven Test...

Hi all! As promised, I am back with a follow-up post. This time, it’s mostly for developers who want to understand how to implement BDT in their Spring Boot Applications. Let’s do it step by step. Step I — By defining my acceptance criteria, I make sure I always have a story to work with. My tests are as good as the acceptance criteria. Over the years, BDT has moulded me to understand the functionality first. The more I try to understand the functionality, the more I ask questions during inception or grooming. This helps me clarify each and every doubt I have and push the BAs to add all these clarifications in the acceptance criteria. Since we are not BAs, let’s assume we have a well-written story with acceptance criteria. Usually, users are greeted with “Hello World” when they browse our application. Step 2 — Another best practice that I follow is TDD. I write my tests first and code later. The tests I write will fail, and then we will write code to fix it. So let’s write a “feature file”. A feature file has an extension of “feature”. Feature files will have a scenario and steps to test that scenario. Let’s name the file “bdt.feature”. The following is the content of our feature file: “Scenario: Welcome Message Given BDT Application is up Then when we browse to the application we are greeted with a welcome message” Since I am assuming that the readers already know how to set up a bare minimum Spring Boot Application, I am not going through the steps of writing one. For now, I have a main method of a Spring Boot Application and nothing else. I will be committing to the git repo while I am writing this. The URL to the repo is here. See commit number “b1a905d0298cc3d7065460982ec78e1c614f4f00”. Step 3 — Now that we have the steps to test a scenario, let’s set up the testing framework. The first thing we do is to bring in all the maven dependencies. Once the dependencies are in, we need to write our Steps file that implements all the steps that we have written in our feature file. We also write a class named “BdtTest” that binds the feature file to the Steps file. To auto-generate step functions from the feature file, simply right-click the “BdtTest” class and say “Run BdtTest”. Cucumber will then spit out the missing steps and the snippets to implement them. Simply copy these snippets into your Steps class and our test setup is ready. For code coverage, we just need to add the Jacoco maven plugin. Let’s push the changes to git (Commit 4471560e3a95efa099526d1246ef595f8f886d4e). Step 4 — Before we write the implementation for the steps, let’s understand what is going on. This will load the Spring Context. In our case, we are loading the whole Spring Boot Application. It loads a WebServerApplicationContext and provides a real web environment. Embedded servers start and listen on a defined port (from your application.properties or test.properties) or on the default port of 8080. The @LocalServerPort annotation can be used to inject the actual port into your test. Ok, let’s implement the step now. We need to call a REST GET API that would give us a “Hello World” message. Let’s call this API through a rest template and assert the response that we get. Since it’s a GET, we call the Rest Template “getForEntity” method. We need to pass the response type of the call which is GreetingMessage.class. We have a compile-time error as we have not written this class yet. So let’s write it first. It’s just a POJO with a property named message. If we run the test now, we know that it will fail as we have not written the API yet. So let’s catch the HttpClientErrorException and throw an AssertionError. Ok time for another git push (Commit 8fe53fbffb7846eb795b7536b413341c49c98ea4). Step 5 — Let’s make our tests pass. We now write our controller and service class. Tada! Our tests are now passing. Time for committing(Commit 0d87d1743bb690d4acc34a6c1bb016190f791fc7) Step 6 — Let’s complicate matters by a bit now. Let’s say the “Hello World” message is actually passed to our Service class from a third-party application that is hosted in localhost:8888. For practical purposes, this can be any third party application such as MyInfo or Google. But during our test, we cannot rely on this service, so we shall mock it up. To mock this, we bring in a maven dependency of wiremock. We set up a wiremock server before we run our tests. And shut it down after we are finished. We then stub the GET request to third-party service and return a JSON. This JSON is then parsed by our Service class and is ultimately given back to the client, which in this case, is our test. Let’s save our code first and then we can go through some explanations (Commit a375b2c8ef844fddb64d6c85718da8330b56b888). Let’s understand the code now. Basically when we start the test, we also start a wiremock server at localhost:8080. The wire mock server intercepts calls made to localhost:8080 and matches the URL with the actual stub that is configured. And when it matches, it returns the stub response, which in our case, is thirdparty.json. Now the question is — the third party service is hosted at http://localhost:8888, so how we are hitting localhost:8080 during a test. We are able to change this during the test, since we have externalised the domain and port. In application.properties, we have added the following: In test.properties, we have added the following in our test > resources folder: We read this property in our Service class through a Value annotation and pass it to our rest template URL. Now the only question left is how will our Spring Boot Application know how to read the test.properties. This is done by the below line: Spring Boot will read application.properties but will override it with test.properties whenever it finds a matching property. And that’s it! I hope you are feeling a little more comfortable jumping on the BDT bandwagon. As promised, in the first part of my post, I will come up with a very short third part, which shows you how to do a unit testing of a private method. Till then! Goodbye!

Hardware Vulnerabili...

In the first week of 2018, Meltdown and Spectre were publicly disclosed. The news of these vulnerabilities led to shockwaves across the world, with consumers and businesses terrified about their security posture and potential to be compromised. To understand why these vulnerabilities were such a problem, we need to understand what makes them unique. Although there are hundreds of new vulnerabilities reported every day, the vast majority of them are in the software. This includes the operating system (such as Windows 10, macOS, android, or iOS) or the web browser you are using (such as Chrome or Mozilla). When a software vulnerability is disclosed, developers can debug and diagnose what is causing the aberrant behaviour, fix the code that caused the vulnerability in the first place, and finally, release the patched version of the software to make it available immediately to everyone in the world. Meltdown and Spectre, however, are hardware vulnerabilities. More specifically, these vulnerabilities are due to issues with the design choices and features of the hardware (in this case, the CPU chip). Depending on the vulnerability in question, proper safeguards on the software level might even be circumvented. This means that it may not be possible to “patch” the CPU at all; the only way to ensure security might be to buy a completely new CPU! Although Meltdown and Spectre are sometimes considered a single vulnerability, it is more correct to think of them as a family of vulnerabilities that depend on specific features of modern CPUs. Meltdown relies on a feature called “out-of-order execution”. This feature allows an unprivileged user process to read the private memory of a different process, such as other applications of the kernel of the OS itself. This private memory may contain secrets or passwords. Spectre, on the other hand, relies on speculative execution. Spectre works by allowing an unprivileged user to leak the memory of a different process, even if the process in question is perfectly written without any bugs and follows best practices. In fact, a well written program is MORE susceptible to Spectre-type vulnerabilities because best practices means more safety and error checking! Both of these are examples of side-channel attacks. A side-channel attack is one that relies on information inferred about the data in a computer based on its implementation, and indirectly-related signals such computation timing, cache monitoring, and power monitoring. While Spectre and Meltdown may be the most famous examples of hardware vulnerabilities, they are far from the only ones. Throwhammer and RAMBleed, for example, are vulnerabilities that take advantage of how memory (SDRAM) chips are manufactured. They belong to a family of vulnerabilities known as Rowhammer attacks. These are caused by a hardware design flaw in the chip. Normally, a memory chip is made up of memory cells arranged in a grid pattern. These cells store the value of a single bit (0/1). A high voltage corresponds to a 1 and a low voltage corresponds to a 0. In 2014, researchers found that if the same row of cells were repeatedly read over and over again, an electrical charge will be created that flips the bits in the adjacent rows. This means that theoretically, it is possible to use this attack to modify the data of other processes i.e. either corrupt or manipulate data. Throwhammer is a vulnerability that allows rowhammer attacks to be carried out over a network due to the Remote Direct Memory Access (RDMA) feature of server-grade network cards. RAMBleed is a variant that combines Rowhammer with a side-channel attack to make it possible to steal data from adjacent memory cell rows, rather than just modifying it. When a vulnerable design choice or feature is discovered, the offending feature is investigated more thoroughly by security researchers, and more variant vulnerabilities are usually discovered over time. Meltdown, for example, has at least 6 variants, while Spectre has at least 9. This time lag can result in negative PR for the companies involved. Furthermore, the research into Meltdown and Spectre eventually led to the discovery and categorisation of “Microarchitectural Data Sampling (MDS) attacks” after finding two new families of vulnerabilities: Fallout and RIDL. These are similar to Meltdown/Spectre in that they are side-channel attacks and can be used to leak passwords and secrets. They take advantage of MDS to expose data leaving internal CPU buffers, which can include non-cached data. While Meltdown and Spectre depend on knowing which CPU chipset is used by the machine to successfully exploit the vulnerability, Fallout and RIDL does not require such information. This makes it much harder to mitigate these vulnerabilities. The best way to mitigate this vulnerability is to disable hyperthreading on all CPUs, which may result in a noticeable performance drop. Most vulnerabilities take advantage of a specific application with vulnerable code. Anti-virus tools usually work by comparing the contents of each file with a database of malicious code signatures. If there is a match, that file is considered to be malicious. In contrast, the attacks discussed so far can be abused as part of any piece of software that runs on a machine, not necessarily a malicious, pre-compiled application binary. This makes them extremely hard to be discovered by anti-virus solutions (But it is not impossible). Furthermore, most hardware vulnerabilities do not leave any trace in any log files as it bypasses most of the software layer. While it may be difficult to prevent this kind of attack being possible, they are quite difficult to pull off in practice. This is because they usually require local code execution to be possible. Also, it may take a combination of vulnerabilities to steal actionable data; a single vulnerability by itself may not be able to accomplish much. These attacks are also usually very slow, and thus require a prolonged period of exposure to allow an attacker to steal/corrupt data. Meltdown, for example, can only read memory at ~120 KB/s. Mitigating hardware vulnerabilities can be troublesome due to the lack of one-size-fits-all solutions. Depending on the hardware, the vendor, and the variant of the vulnerability, the mitigations will be different. This makes it very difficult to know if you are affected without doing some research. Furthermore, when a new family of vulnerabilities is discovered, mitigation might mean sacrificing performance (or money if you need to replace hardware). Although mitigation is tough, it is not impossible. It starts with having thorough knowledge of all your hardware assets. This allows us to check if there is a new security advisory or a patch available. By looking at what data is most critical and sensitive, we can add layers of security and monitoring controls to protect that data i.e. practicing defence-in-depth. This may make it uneconomical for you to be targeted. Defence-in-depth allows the defender more time to determine who the attacker is. In cases where resources are not a concern for the attacker, it may not be possible to stop the attacker. However, the extra time may allow you to determine who the attacker is. When a new vulnerability is discovered, the most important thing is to mitigate immediately. Most software vendors will quickly release instructions on how to do this. Additionally, as most hardware vulnerabilities require local execution, it is extremely important to have good physical security. Do not leave your computer/phone anywhere public as it can be easily tampered with. Do not leave your devices turned on and idle for extended periods of time, as most of these attacks are quite slow. Hardware vulnerabilities are a very thorny problem that will only get worse as computers, phones, and IoT devices become increasingly ubiquitous. Vigilance and a proactive approach are the best tools in this fight.

Business Driven Test...

Hi there! I thought of sharing my approach on testing CRUD applications. First, a little introduction about myself — I am a senior engineer working on backend system design and the architecture of web applications. When I first started working, I understood one thing very quickly, that a robust, resilient ,and scalable application needs thorough testing. Like most developers, I started with JUnit to write unit tests. But slowly with time, I started using Cucumber. When I made the shift, a lot of people from the developer community had warned me that using Cucumber entailed doing more integration tests rather than unit tests. That testing using Cucumber consisted of black box testing, and it would not help in test coverage reporting. Due to this, I was a little sceptical at first. But as I continued using Cucumber, I grew fond of it more. I will try to list down the reasons why I like it, while addressing some concerns. Firstly, I accept using Cucumber means more of an integration test rather than a unit. But even so, where’s the harm in that? I generally use this testing technique for REST API. In most cases, they do simple CRUD operations. Rather than testing a specific method with different inputs, I prefer testing the whole business flow by mocking client calls providing the same range of data. And since I use feature files to do so, I can map each test case to a user acceptance criteria. Now, don’t get me wrong, I do use JUnit also. When there is a specific code that is algorithm heavy, I prefer using writing multiple JUnit tests cases to test it. For example, a method that outputs a reference number depending on a user logged in and the current day of the year. In this scenario, it makes sense to write a JUnit test case to test the algorithm by iterating through a multitude of inputs. (I will share how I test a private method using JUnit in a later post!) Using Cucumber is not black box testing. You can measure the code coverage. Since you are testing a feature, it touches all the layers and gives you coverage. On the other hand, using JUnit involves writing a lot of tests. This is one of the major reasons why I like using Cucumber. Thus, more coverage is achieved in fewer tests while covering all features and scenarios. We can use Cucumber-spring integration and a JACOCO maven plugin to generate the coverage. The Cucumber-spring integration can be integrated into the project quite easily just by adding a maven dependency. Thus, all the tests can be executed as part of a maven build. Since the application will be booted before running the test, it also gives confidence to the developer that the spring boot application will boot up without any issues and the application context and dependency injections are behaving the way it should be. This ensures that when a build passes after running test cases, the build deployment is also a success. Some have concerns that if a feature has integration touchpoints with the upstream or downstream application or database operations, then how can we use Cucumber? Mocks to the rescue! If we are concerned with the database, we can use an in-memory database like H2. Import.sql file placed in the resources folder can be used for seed scripts. Embedded Redis or Hazelcast comes in handy for tests that use a cache. An embedded Redis can be easily added by adding a maven dependency and setting the scope to test. Sometimes, there can even be easier solutions. For example, you can set the cache type to in-memory by adding spring properties such as spring.cache.type=simple. Similarly, Embedded Kafka or ActiveMQ can be used for mocking integrations with an MQ. If the integration with another system is over REST or SOAP, the communication can be mocked by using WireMock. Loading of spring properties or starting the WireMock server and other embedded servers are part of the test setup, and should be stopped as part of clean up code once the tests have run. As a summary, I choose to use BDT for REST API based application as it:- 1. Helps me in writing tests in plain English and can be mapped to acceptance criteria 2. Gives me higher code coverage with a lesser number of tests 3. Gives confidence to dev ops by actually testing the loading of the Spring Boot application during the build process, and thus preventing any application load issues after deployment 4. Gives confidence to the developer by actually testing the flow of how an actual user or manual tester does calling the API through a client (browser or mobile etc.) Before wrapping up, let me again reiterate the points that I have mentioned here are of my own beliefs. If a developer prefers JUnit over Cucumber and is comfortable and confident on his code by using it, he should continue using it as his preferred choice. Whichever weapon we choose, our end goal is the same - abug-free code :) P.S.:- For more hardcore developers, I will share code and a GitHub URL to share how Cucumber and JACOCO can be used to test a Spring Boot Application in my next blog. Till next time. Have a great day ahead!

DevSecOps And The Si...

Earlier this year, I had the wonderful opportunity to attend Cloud Asia Expo 2019 in Singapore, a large-scale event featuring some of the most promising brands and speakers. Not only was I fortunate enough to be in the company of many like-minded people who had come from different backgrounds and cultures, it was also amazing to experience the breadth of topics that were covered under one roof over two days. My main interest was learning more about security so I decided to focus more on Dev-Security-Ops and its implementation. Here are my takeaways! DevSecOps Although organisations that adopt a DevOps culture will benefit from its successful implementation, such as higher product quality and customer satisfaction, and faster time to market through complete automation and team collaboration, we cannot have a well-working DevOps workflow in place without an emphasis on security. In a DevOps culture where development and operation teams are collaborating and working together, we cannot have security silos. In order to achieve software compliance, security needs to be integrated in every phase of the workflow by enforcing multiple checkpoints. It’s no secret that the frequency of cyber crimes has increased over the years. In June 2018, Singapore experienced its worst cyber attack when its largest health care system was hacked, resulting in almost 1.5 million patient records being compromised. These data included IC numbers, names, addresses, gender, race, and birth dates. Imagine how much worse it could have been if the attackers had managed to gain access to patient payment details. So, what was the cause of this hack, you ask? Simple — it was too easy. Systems were vulnerable to malware, security policies were not enforced, and the network and security teams were not proactive enough to identify and monitor the system with frequent checks or alerts. This is just one scenario which received a great deal of exposure, proving that providing a secure environment or software to consumers is as important as delivering a functional product. With DevSecOps methodologies, we are able to implement an effective security system that can prevent these types of mishaps from happening in the future. What exactly does Dev-Sec-Ops mean? DevSecOps is a simple evolution of DevOps that emphasises the importance of security in the software release pipeline. To achieve a DevSecOps culture, we must take into consideration an individual’s role, their background influence and their interest in different aspects of DevSecOps practices, and by enforcing regulatory governance and software compliance measures. What Was and What should be? In the legacy SDLC model, enforcing security practices were done during the end phase since most of the focus was directed on application development. However, the discovery of security threats at a later stage resulted in countless reworks and time-consuming tasks. It also made software vulnerable without properly enforcing regulatory and compliance measures. With DevOps implementation, teams work together during the different stages of the entire process such as development, CI, build, test, and release! That being said, in order to achieve complete DevSecOps, we must incorporate security in all stages by enforcing regulatory governance and software compliance measures. How to achieve DevSecOps? As a DevSecOps culture is similar to a DevOps culture, it can be successfully practiced and implemented by following some best practices. Shift Left Approach to DevOps Shift Left means to have security testing enforced at the nascent stage of development instead of waiting until the end. Doing so will help to identify potential vulnerabilities early on and help you to fix them with minimal cost. Even though it could be complex to apply this, since it might disrupt the DevOps workflow, it is very advantageous in the long run. Adapt Microservice and Containerisation By adapting microservice architecture, large and complex systems can be simplified and broken into simple services, which in turn helps to increase agility in the system. Thus, any business changes can be implemented faster in an effective way. These microservices can be deployed as containers that enable easy maintenance of application security. Implement CI and Automation By automating as many processes as possible during development, manual interventions for security and operations can be avoided, which results in faster execution and more secure releases. Implement Scanning and Monitoring We can scan applications for vulnerabilities during run time (dynamic scan) and secure source code repositories to remediate time to time by refactoring, which is required to update libraries and versions (static scanning). Constant monitoring of applications should be in place and an alert mechanism that can notify you of abnormal behaviours in the server or application must be implemented. DevSecOps Security Testing Tools The deployed application, infrastructure, source code, and even pipelines need to be secured from the outset. This can be achieved using certain tools that can help in continuous testing so that issues can be addressed immediately. A few of these tools are: - Dynamic Application Security Testing (DAST) tools - Static Application Security Testing (SAST) tools - Interactive Application Security Testing (IAST) tools Conclusion DevSecOps is not bunch of tools or security practices — it is a cultural shift like DevOps. It is a natural and necessary response for modern delivery pipelines that can overcome the bottleneck effect we faced in older security models.

DevSecOps: How Can W...

When we think about the prevention of breaches and remaining secure, we usually start with tools such as WAF (Web Application Firewall) and anti-virus software. We then automate them to improve effectiveness, hoping it will protect our most valuable data. But despite our preventive measures, why are security incidents still continuing to grow? That’s because we are leaving people out of the picture. Educating people should be the initial and end goal. Ignorance, trust, greed, and laziness can open doors to attacks from anywhere in the world. Cybersecurity is a people problem, the solution lies in the people. Complete with tools as resources, we can create a secure and continuous environment. Based on the agile methodology, we are all one team divided into groups of people with different outputs. The key is to integrate those outputs into a continuous workflow, where security is included in every step. Security shouldn’t be an isolated department. We need to eliminate the security bottleneck by incorporating it into engineering. This way all members can help each other embrace a secure system. Sprint by sprint, the whole team must work together in the integration to reduce the risk level and increase security. In the Coding & Building Phase → SAST & SCA — > Run Static Application Security Testing (SAST) against Source Code or Binaries of an Application. - After every sprint, write new rules into SAST or LINTER/Code Quality tool based on reports and tune them to avoid false positives - It can be embedded into the IDE - The best approach is Abstract Syntax Trees, as it makes it more accurate - Integrating SAST into the repository with every Pull Request yields great advantages - Only run incremental scanning on every Commit (Commit Hook) so you don’t slow down the dev process Tools: Bandit, Brakeman, NodeJSScan, FindSecBugs, GoSec — > Run Source Composition Analysis & Software Bill of Materials. Software is rarely built from scratch. They are a complex collection of various libraries and third-party dependencies. - Avoid Inherited Risk - Manage Licence Compliance - Identify Component Age - Identify Know Vulnerabilities - Avoid Typo-squatting (Lookalike naming for modules). Write rules to avoid NPM modules with identical names or identical names with punctuation Tools: CycloneDX, NPM Audit, OWASP Dependency Track, Safety, OWASP Dependency Check, Snyk — > Safeguard Secrets/Sensitive Information in Centralised Source Repos - Avoid sharing API Keys, Passwords, etc.. Tool: Talisman In the Testing Phase -> DAST & IAST → Parameterised DAST with Test Frameworks - Give QA Team a single fabric for both Test Automation and Security Testing - Use proxy tools. They allow you to drive the traffic of desired functionality - DAST Scanners perform infinitely better with “parameterisation”, especially SPAs and Dynamic-Front-end apps. It allows the tool to bypass the spidering process - which is time-consuming and ineffective - Integrate security into unit test cases - You can create your whole pipeline in Robot Framework. This way it is technology agnostic and can be versioned. Tools: Selenium, OWASP ZAP, Robot Framework → Interactive Application Security Testing This test is sitting with your build agent, has access to source code and reads all requests, registering findings in the app based on the flows. - Run it in your staging or dev env - With this test in place, false positive are extremely low - They are faster than most DAST and SAST tools - Can match source code(SAST) and payload(DAST) - Is a combination of 3 tools in a single set of workflows — SAST, DAST and SCA Tools: Contrast Security, OpenRASP, Synopsys, Checkmarx While Releasing & Deploying → Security in IaC (Infrastructure as code) → Scan your container configuration Containers are just configuration-based services. They run in the same machine along many others. A weak configuration allows an attacker to gain access to the machine though your container. - Avoid manual deploy - Run SAST against Kubernetes YAMLs and Specifications - Run SAST against Dockerfiles and Containers Specifications - Don’t run the container with root privileges Tools: DockerSlim, AppArmor, SELinux Let’s Build Pipelines! TIP: If the team is not mature in terms of security, don’t break the build. Work with them to understand why would it break. Start with an autonomous pipeline, then gradually move to an incremental one. Autonomous Pipeline - Security must be oriented around a parallel pipeline. This way you don’t slow down the dev team process - The security engineer in charge on this pipeline must HELP the dev team to learn - Use multiple tools for the same purpose to validate findings - Open source libraries give you the power to extend and customise the tool based on your needs - Weekly or after release, exhaustive SAST, Source Composition and DAST (backed by test automation) - Every two months exhaustive DAST with all test cases Incremental - SAST, SCA run against current code commit and affected files (commit) - DAST runs against smaller modules on a daily basis, backed by Test Automation Correlation results - Integrate Results with JIRA/Other Bug Tracking Systems - Integrate with Test Management Frameworks. - Leverage JIRA Webhooks to track state Tools: Archery, DEFECTdojo →Success Factors← - Dockerise all tools so you can run them in any environment - Use many different tools for the same purpose to avoid false positives - Open source tools give you the flexibility to adjust them - Categorise the vulnerabilities by CWE Id for better integration across all tools. Different tool may call the same vulnerability with different names. Using the CWE id helps you avoid duplication. - Gather metrics and inject findings into the process. (update regression tests, patch outdated dependencies, etc..) ← this is how the pipeline learns how to be more secure Now Integrate the People! 1.Business team <-> Security engineer The business team needs to work with security engineers to identify their biggest concerns. Concurrently, security engineers drill down on common threats based on their experience. Together, they can create user stories with threat associated modelling. 2.QA team <-> Security engineer The QA team creates test cases based on these user stories (acceptance tests). Based on the initial threat modelling, security engineers can introduce abuser stories where the tests are written from an attacker’s point of view. 3.Dev team <-> Security engineer A training program is very important. The security engineer should run a workshop to teach the dev team to code securely, focusing on topics such as the biggest vulnerabilities and how to identify them, as well as on how to be a part of the team and not just a consultant. If the dev team writes less buggy code, building stronger security capabilities will be much easier. They can also open a pull request and add any security comments as most devs like to read there. The security engineer should also put in place any tool that can help devs recognise insecure code automatically as soon as possible, and implement SAST (static application security testing) integrated in the IDE and on every build. Threat modelling concerns should be injected into the dev unit tests as well. 4.Pen tester <-> Security Engineer <-> QA Team The Pen tester needs to run manual tests on logic flaws before the security engineer, along with the QA team, can write scripts as a regression test with the findings from the pen tests (app logic) and include them into the pipeline. 5.Incident Response team <-> Dev Team <-> Security engineer Incident response involves a security engineer who can identity the breach based on existing patterns and allocate the right dev team to fix the code. - Track the right metrics. DeviceID, IP, and UserID will provide all required information to track back and identify - Query efficiently. Too much information is not helpful if it is not readable - Equip the right tools, OSQuery is great - Plan for the AppSec attack. Simulations will keep everyone ready. You don’t want to improvise at 3am… Don’t forget… → If you had ever experienced or are currently experiencing culture problems before this, it’s unlikely that new technology can fix any of that. Fix your culture problems first. → Continuously educate all team members about security based on the findings and get them gradually more involved. Engage, rather than enforce.

CoreOS Clair — Part ...

Clair is one of the most popular open source tools providing static image scanning for container images. In my previous post, I had presented some background about CoreOS Clair and the way it works. In this post, I will be delving into Clair installation and integration with Klar and clairctl. All work for this session was done in the Google Cloud environment. The setup: Setup 1 — Kubernetes cluster with 1 master node and 3 worker nodes. Setup 2 — Single compute node running Docker (instance-1) Other — Single compute node to interact with Clair APIs (instance-2) Installing Clair Clair can be installed in two different ways: Installing with Docker. Deploying on Kubernetes. With docker, let’s assume you have a working docker environment. #Creating the clair configuration directory mkdir clair_config#Downloading the clair config files curl -L https://raw.githubusercontent.com/coreos/clair/master/config.yaml.sample -o clair_config/config.yaml#Spinning up the Postgres container docker run -d -e POSTGRES_PASSWORD="" -p 5432:5432 postgres:9.6 Important: The CVE database needs some time to update. Meanwhile, you can skip to the next step as the definitions will be ready only about 30 mins after the Postgres start time. #Starting Clair with the config yaml in place docker run --net=host -d -p 6060-6061:6060-6061 -v $PWD/clair_config:/config quay.io/coreos/clair:latest -config=/config/config.yaml Clair API server runs on TCP:6060, while the Clair health API runs on TCP:6061. To verify, call the health API. curl -X GET -I http://localhost:6061/health To deploy Clair in Kubernetes, simply deploy Postgres and Clair in Kubernetes as a deployment. #Download Clair secrets from the Release-2.0git clone --single- branch --branch release-2.0 https://github.com/coreos/clair#Create Clair secret kubectl create secret generic clairsecret --from-file=./config.yaml #Create Clair deployment, this will spin up Postgres and Clair pods. kubectl create -f clair-kubernetes.yaml#Verify kubectl get pods kubectl get services kubectl describe service clairsvc As seen here, Clair is running at NodePort TCP:30060, and can also be accessed with the endpoints 10.20.0.3:6061 and 10.20.0.3:6060. So, let’s access from instance-2. curl -X GET -I http://10.20.0.3:6061/health Now that the Clair setup is ready, let’s try using some Clair client tools to run some scans and analyses. Clair with Klar Klar is a very simple and lightweight command-line tool that doesn’t require any installation. Simply download and copy the binary file to a location available in the $PATH. #Download the desired klar binary form the klar github site. wget https://github.com/optiopay/klar/releases/download/v2.4.0/klar-2.4.0-linux-amd64 Look up https://github.com/optiopay/klar for more information. Running an image scan with Klar: CLAIR_ADDR=10.20.0.7:6060 \ CLAIR_OUTPUT=High \ CLAIR_THRESHOLD=10 \ DOCKER_USER=anuradhai4i \ DOCKER_PASSWORD=xxxxxxxx \ klar anuradhai4i/release_1.0 Please refer to the detailed documentation for Klar command-line parameters. Clair with clairctl Download the clairctl binaries from the release page and add the location to the $PATH. wget https://github.com/jgsqware/clairctl/releases/download/v1.2.8/clairctl-linux-amd64 clairctl needs some basic configuration stored in a file clairctl.yml, so we need to save the following content in clairctl.yml and update the content accordingly. clair: port: 6060 healthPort: 6061 request: host: HOST headers: myHeader: header uri: http://10.20.0.3 report: path: ./reports format: html docker: insecure-registries: — “my-own-registry:5000” Don’t forget to create a report directory in the desired location and update the clairctl.yaml. Now we can check the health: clairctl --config=clairctl.yml health The result should come back as a success if Clair is properly deployed and its endpoints are reachable! Hurray! We can now scan the image. But before that, ensure that you are logged in to the docker hub with the docker login from the command line. Just like in the previous example, the scan results can be seen in the text format. Similarly, executing clairctl with report instead of analyse will generate a scan report. clairctl --config=clairctl.yml report anuradhai4i/release_1.0 This will create and store HTML reports in the location specified in the clairctl.yml And the report will look something like this: To conclude, the links given at the end of each vulnerability description is linked with the relevant CVE update and definitions. More details and fixes can also be found for each issue. Further reading: coreos/clair jgsqware/clairctl benfab/clair-demo optiopay/klar leahnp/clair-klar-kubernetes-demo Build, Collaborate & Integrate APIs | SwaggerHub Static Security Analysis of Container Images with CoreOS Clair Static Analysis of Docker image vulnerabilities with Clairupst

How To Effectively I...

As organisations are migrating more and more computing to the cloud, they are at risk of becoming more susceptible to malicious attacks. When it comes to the cloud, there’s a difference between what a cloud provider sees and what an attacker sees. A cloud provider’s perspective: Cloud is ever present, ever accessible Provides a wide range of computing services Enables rapid development and deployment Cloud consumption is rapidly increased An attacker’s perspective: Can be continuously and relentlessly attacked A wide surface area to attack Easy to make mistakes and configuration errors Makes a super attractive target Most attacks are not new - things like malware, password brute forcing, credential theft, DDoS, and SQLi are all common in legacy and on-premise systems. Aside from these, there are also new types of attacks emerging in cloud environments such as password spraying, crypto miners, harvesting secrets/subscription keys, and file-less attacks. For instance, password spraying works by taking one password and throwing them into multiple accounts, while password brute forcing takes one account and throws many passwords against it. There have been many reports of attacks along the supply chain and on misconfigurations in cloud infrastructure. When we think about attacks on the cloud, we can group them as such: Tenant level (Any organisation that puts their infrastructure in the cloud) User elevated to tenant admin Multi factor authentication changed Subscription level External accounts added to subscription Stale accounts with access to subscription Attack detection service not configured properly IAAS Known hacker/malicious tool/process found Account password hash accessed Anti-malware service disabled Brute force login attack detected Communication with a malicious IP TOR IP detected File-less attack technique detected Outgoing DDoS attacks PASS Malicious key vault access — keys enumerated Anonymous storage accessed Activity from unfamiliar location SQL injection detected Hadoop YARN exploit Open management ports on kubenetes nodes Authentication disabled for App/Web services SASS A potentially malicious URL click was detected Unusual volume of external file sharing Password spray login attack We should think about all these areas that need to be secured. Besides securing cloud infrastructure, it is also important to apply a good monitoring mechanism to respond to any kind of incident. But the problem is — are SOCs (Security Operations Centre) really prepared? There are many challenges surrounding the implementation of a cloud monitoring system that prevent SOCs from keeping up to date. Most cloud platforms are tenants or are based on subscription models, therefore creating new boundaries Many cloud services = many attack types, and these attacks are becoming more sophisticated Since cloud environments are still relatively new, gaining familiarity with this new technology involves a steep learning curve If you have an on-premise SOC and you want to create a hybrid environment, it makes detection and investigation complex Cloud infrastructure and services are a lot more dynamic in nature. Organisations will keep on running new services while cloud service providers rapidly will concurrently release new features. Furthermore, DevOps and SRE teams make frequent changes to their production systems. It will take a huge amount of effort to keep SOCs up to date with these new services. If our servers are on-premise, we have control over the network. If an incident happens, we can perform actions like blocking IP or taking down the machine. However, we may not have the same flexibility on the cloud. Monitoring will require establishing partnerships with SOC analysts, cloud resource owners, subscription owners, and cloud service providers. SOC analysts may even need intervention from cloud resource owners in order to obtain resources to conduct investigations or for implementing remediation steps. In order to implement an effective cloud monitoring system, we have to identify the odds and events that are generated in the aforementioned attacks. We can divide event types into four categories: Control plane logs — ex: Create, update, and delete operations for cloud resources Data plane logs — ex: Events logged as a part of cloud resource usage or Windows events in a VM, SQL audit logs etc. Identity logs — When you design cloud infrastructure, you need to identify the identity architecture. It should be possible to map identity with any action, such as AuthN, AuthZ events, AAD logs etc. Baked alerts — ex: Ready to consume security alerts, ASC, CASB etc. It’s very beneficial to have a common raw events repository and an alert/log template that can help in log analytics. Additionally, it’s also better to include these data as a common template: Event ID, Event name, Subscription ID, Resource name, Resource ID, Event time, Data centre, Meta data, Prod or dev, Owner ID, User ID, Success or failure. This can help build your custom monitoring scenarios and help your SOC to run investigations. Some alerts and logs can be false positives which may generate lots of load for the SOC. To prevent overloading, we can configure some limits so that it can be redirected to the resource owner. If the resource owner feels like a certain alert or log needs an investigation, they can then redirect them to the SOC. SIEM (Security information and event management) system’s design and architecture is evolving too. If your on-premise infrastructure already has SIEM setup, it’s better to start bringing cloud events to an on-premise SIEM. Most cloud providers have connectors to popular SIEM’s that makes integration seamless. Over time, you can also consider moving to a cloud-based SIEM so that you can move on-premise events to the cloud SIEM. The last approach is to combine cloud and on-premise things into one big data platform. It provides more flexibility and a great user experience. There are various mechanisms to fetch events: REST API calls Connectors by SIEM vendors Conversion to standard Syslog format Skilling up your analysts and engineers is the key to success. Start by providing trainings about cloud concepts like IAAS, PASS, and SAAS. You can begin with IASS as it is more close to on-premise before moving on to PASS, which is more complex than IASS. Try to avoid specific things, accept flexibility, find people who understand data, and keep learning. To be successful in implementing a proper monitoring system, we have to configure it right. We can apply tools like Azure CIS benchmark to achieve this. Prioritisation is super critical. We have limited resources but have hundreds of use cases. We can use threat modelling to prioritise monitoring scenarios and cut noises. And last but not least, we cannot forget the importance of constantly scaling up team skills, designing the right SIEM architecture, and establishing a mechanism to keep up with new features in the cloud.

CoreOS Clair - Part ...

In modern Software Development practices, use of containerisation is increasing significantly within a DevOps culture. Most of these environments benefit from the rich features provided by containerisation such as scalability, portability, and process isolation. However, it is important to think of “how secure” a software really is before we ship it to our clients. While creating container images as our releases, heavy use of third party and outdated libraries can introduce added vulnerabilities to the images we ship. Therefore, a solid way of scanning container images is needed. A few solutions are currently available in the market, but the most commonly used open source image scanning tool is CoreOS Clair. Clair is an open source vulnerability scanning platform by CoreOS and it provides static analysis of Docker Container Images. It can directly integrate with CoreOS (Red Hat) quay.io, Quay Enterprise, Dockyard, and many other registries. How Clair Works? Clair consists of 3 main components: 1. API server 2. PostgreSQL database 3. CVE Data Sources Clair does not have a CLI or a UI to interact with — the API server provides a REST API interface to interact with Clair. There are many CVE (Common Vulnerability and Exposure) sources Clair works with, such as Debian Security Bug Tracker, Ubuntu CVE Tracker, and NIST NVD. Built-in drivers: Debian Security Bug Tracker Ubuntu CVE Tracker Red Hat Security Data Oracle Linux Security Data Amazon Linux Security Advisories SUSE OVAL Descriptions SUSE OVAL Descriptions Alpine SecDB NIST NVD CVE updater is in sync with these sources and updates vulnerability database. The API interface is used to interact with Clair to scan images and layers before updating the database. Clair does static scanning which means it scans the image and not the running container instance of it. Docker images consist of one or more layers, which are also images and stored in docker registry as blobs. So, Clair, analyses these layers or a collection of multiple layers (an image). Deployment Patterns Clair can be deployed in different ways but the most commonly used methods are Docker Compose and Kubernetes-based deployments: 1. Deploy in Kubernetes — Could be easily deployed in Kubernetes with Clair Helm Chart (with the version 3.0.0-pre which is a pre release as of now) or deploy it with the yaml files provided with the 2.0 release. 2. Using Docker Compose — A few commands to run Clair in the Docker runtime. Clair Client Integrations Since Clair provides REST API only, there are client integration tools to interact with Clair. And most of them helps us to integrate static image scanning to our build and delivery pipelines. Klar Klar is a command line tool which can be used to analyse docker images stored in a private or public docker registry and even with ECR or GCR. It is also a single binary file without any dependencies. Klar runs with a set of environment variables as configurations and when it runs, it works between Clair API server and the registry. The scan output can be captured to a text file or a JSON file, and can be used to list and provide additional information for the captured vulnerabilities. End of the output: Clairctl Clairctl is also lightweight command line tool that works between the Clair AIP server and registries. Most importantly, Clairctl functions as a reverse proxy for authentication. It is also capable of generating html reports with the scan results. A fraction from the HTML report: Conclusion It is an essential practice to incorporate security testing to the CI/CD pipelines in a DevOps environment. In modern software delivery platforms using containerisation, we must also consider a way to scan the container images before shipping them to clients and production environments. Thus, we can use CoreOS Clair and other related tools that provide static scanning for container images. What’s Next? In part 2 of this session, I will cover the installation and set up of Clair with different deployment methods while exploring Klar and Clairctl tools. Till next time!

Overcoming Regulator...

Although REST APIs form a large part of the web-service based applications prevalent today, it comes with certain limitations as the use cases evolve and numerous security, architectural, and performance requirements need to be taken care of while designing the system for scaling and reliability. Developing software projects for private WAN and organisations with high security and regulatory requirements requires both public internet facing and private cloud facing applications. This often comes part and parcel with numerous non functional requirements that put restrictions not only on the tech stack, but also on the overall architecture of the application. Facts To Consider 1. Data is often stored on legacy on-premise data centres that cannot be moved to the cloud. These data centres often have limited bandwidth and have high latency due to physical resource constraints and costs. 2. Due to their nature, applications on either network may go through unplanned downtime and maintenance causing the other side to stop functioning. 3. Some processes may require multiple sub processes involving email or PDF generation, process batch jobs, interface with other 3rd party or black box systems that take additional time, and a simple HTTP REST call isn’t enough to manage a delayed response. 4. Gathering data for log processing, data analytics, and audit logging should ideally not have a side effect on the performance of the overall system. Choosing a Messaging Broker With a number of open source messaging options available, choosing the right message broker for an application largely depends on the business case of the application. Let’s compare two popular messaging brokers: RabbitMQ and Kafka. RabbitMQ - ‘Traditional’ messaging broker that supports legacy protocols like amql0.9 and amqp1.0 . - Simple to understand and use for applications requiring acknowledgment based message delivery queues, dead letter exchanges for retrying consumers, configurations for high availability, reliability, and security. - Has built-in plugins for management, clustering, and distributed application setup. - Easy to understand, use, and deploy. - Typical use-cases: fine-grain consistency control/guarantees on a per-message basis, P2P publish/subscribe messaging, support for legacy messaging protocols, queue/exchange federation, shovelling. Kafka - Relatively newer technology, Kafka is designed for high volume publish-subscribe messages and streams, meant to be durable, fast, and scalable. - Opposite to RabbitMQ, Kafka retains the messages like a time series persistent database. - Supports a large number of consumers and retain large amounts of data with very little overhead. - Typical use-cases: website activity tracking, metrics, log aggregation, stream processing, event Sourcing, commit logs. Getting Started with RabbitMQ using Docker RabbitMQ is an open source light-weight message broker written in Erlang with client libraries in all popular frameworks such as Java, .Net, Javascript, Erlang, etc. . It is based on the AMQP protocol and has several plugins that allow it to support STOMP and MQTT protocols as well. RabbitMQ comes with a plug-in platform for custom additions, with a pre-defined collection of supported plug-ins, including: - A “Shovel” plug-in that takes care of moving or copying (replicating) messages from one broker to another. - A “Federation” plug-in that enables efficient sharing of messages between brokers (at the exchange level). - A “Management” plug-in that enables monitoring and control of brokers and clusters of brokers. Being light-weight, RabbitMQ fits both cloud and on premise deployments without adding much overhead. RabbitMQ Architecture RabbitMQ uses a dumb consumer — smart broker model, focused on consistent delivery of messages to consumers that consume at a roughly similar pace as the broker keeps track of consumer state. RabbitMQ supports both synchronous and asynchronous messaging needs. Publishers send messages to exchanges, while consumers retrieve messages from queues. Decoupling producers from queues via exchanges ensures that producers aren’t burdened with hardcoded routing decisions. Using Docker to Deploy RabbitMQ RabbitMQ is available as a docker image on Docker Store and comes in several versions, including Beta, with or without the management plugin. This article uses3.7.16-management for all further discussion. Clone and Run the Image $ docker run -d \ --hostname my-rabbit \ --name some-rabbit \ -p 0.0.0.0:5672:5672 \ -p 0.0.0.0:15672:15672 \ rabbitmq:3-management This will start RabbitMQ, by default, on port 5672 and the management plugin on port 15672. Navigating to http://localhost:15672 will lead to the RabbitMQ Admin portal, which shows the overview of the network traffic, allowing the user to manage queues and exchanges. Developers can also use rabbitmqctl, a CLI tool, to manage RabbitMQ, although running rabbitMQ on docker will require adding docker exec to the command. Configuring Plugins In order to configure plugins that will be initialised at run time, we need to mount the enabled_plugins file on the docker container. The contents of the file enabled_plugins should look like: [rabbitmq_management,rabbitmq_management_agent,rabbitmq_consistent_hash_exchange,rabbitmq_federation,rabbitmq_federation_management,rabbitmq_shovel,rabbitmq_shovel_management]. One important thing to note here is the Erlang file syntax, which dictates ending files with a .. To make this file available within the docker container, it needs to be mounted as a volume while running the container as described by the -v environment variable. docker run -d \ --name some-rabbit \ -e RABBITMQ_DEFAULT_USER=user \ -e RABBITMQ_DEFAULT_PASS=password \ -e RABBITMQ_DEFAULT_VHOST=/ \ -v enabled_plugins:/etc/rabbitmq/enabled_plugins \ -p 5672:5672 \ -p 15672:15672 \ rabbitmq:3-management NodeJS-RabbitMQ Tutorial 1. Publish a message — send.js var amqp = require('amqplib/callback_api'); amqp.connect('amqp://user:password@localhost', function(error0, connection) { if (error0) { throw error0; } connection.createChannel(function(error1, channel) { if (error1) { throw error1; } var queue = 'dummy_queue'; var msg = 'Hello World!'; channel.assertQueue(queue, { durable: false }); channel.sendToQueue(queue, Buffer.from(msg)); console.log(" [x] Sent %s", msg); }); setTimeout(function() { connection.close(); process.exit(0); }, 500); }); 2. Receive a message — receive.js var amqp = require('amqplib/callback_api'); amqp.connect('amqp://user:password@localhost', function(error0, connection) { if (error0) { throw error0; } connection.createChannel(function(error1, channel) { if (error1) { throw error1; } var queue = 'dummy_queue'; console.log(" [*] Waiting for messages in %s. To exit press CTRL+C", queue); channel.consume(queue, function(msg) { console.log(" [x] Received %s", msg.content.toString()); }, { noAck: true }); }); }); 3. Sending and Receiving messages — node receive.js Start the consumer — receive.js node src/receive.js [*] Waiting for messages in dummy_queue. To exit press CTRL+C Publish a message on dummy_queue node src/send.js [x] Sent Hello World! The consumer should log node src/receive [*] Waiting for messages in dummy_queue. To exit press CTRL+C [x] Received Hello World! Code examples in other popular languages are available on the official Github repo. RabbitMQ in an E-Commerce application Consider the following use case for an e-commerce website: 1. A customer makes a payment on the shopping site 2. Payment is confirmed, server generates invoice and sends the user an email. 3. Notifies the shipping company of the order. 4. Customer receives an email when the order is shipped Why using RabbitMQ is valid for this use case: 1. Under extreme load, email generation could take longer than usual. It doesn’t need to be sent immediately, so adding it the the queue will ensure that this email will be delivered. 2. The shipping company could have a limit to the number of orders they can process. Publishing to the queue will ensure all orders are picked up and acknowledged. 3. Once the order is shipped, a message from the shipping department will notify the user of shipment. RabbitMQ between Public Cloud and Private Cloud Zones 1. Customer places an order and the backend successfully processes the credit card payment. 2. Upon payment success, the server adds two messages to the queues send_email and shipping respectively with the order_id. 3. The send_email consumer receives the message and sends a PDF copy of the invoice to the customer. 4. Shipping department consumes the order from the shipping queue and initiates the delivery process. 5. Shipping notifies the shipping_initiated queue that the order has been shipping. Backend consumer watching this queue sends a shipping notification to the customer. Queue Shovelling Consider the scenario of failure on the private cloud side. The entire machine hosting the application is lost and is out of service. Along with the application server, the RabbitMQ server also stops working. Customer orders placed in this situation will not be processed since the queue is dead. In situations similar to the one described above, it is necessary to reliably and continually move messages from a source (e.g. a queue) in one broker to a destination in another broker (e.g. an exchange). The Shovel plugin on RabbitMQ allows you to configure a number of shovels, which do just that and start automatically when the broker starts. The Shovel plugin is shipped with RabbitMQ and can be enabled while configuring RabbitMQ. Queue Shovelling can be configured both statically at run time and dynamically using the REST API or rabbitmqctl . Dynamic Shovels docker exec some-rabbit \ rabbitmqctl set_parameter shovel my-shovel \ '{ "src-protocol": "amqp091", "src-uri": "amqp://", "src-queue": "my-queue", "dest-protocol": "amqp091", "dest-uri": "amqp://remote-server", "dest-queue": "another-queue" }' src-protocol and dest-protocol refer to the amqp protocol version being used on the source and destination queues respectively, either amqp091 or amqp10. If omitted, it will default to amqp091. src-uri and dest-uri refer to the hostnames of the respective RabbitMQ servers. Similarly, src-queue and dest-queue refer to the source and destination queues on either servers. For detailed documentation, please refer to the official documentation available at: https://www.rabbitmq.com/shovel-dynamic.html Advantages of Shovelling Loose coupling A shovel can move messages between brokers (or clusters) in different administrative domains: - with different users and virtual hosts; - that run on different versions of RabbitMQ and Erlang. - using different broker products. WAN-friendly The Shovel plugin uses client connections to communicate between brokers, and is designed to tolerate intermittent connectivity without message loss. Highly tailorable When a shovel connects either to the source or the destination, it can be configured to perform any number of explicit methods. For example, the source queue need not exist initially and can be declared on connect (AMQP 0.9.1 only). Supports multiple protocols Starting from RabbitMQ 3.7.0, the shovel has support for multiple implemented protocols. They are AMQP 0.9.1 and AMQP 1.0. This means it is possible to shovel, e.g. from and AMQP 1.0 broker source to a RabbitMQ destination and vice versa. Further protocols may be implemented in the future. A shovel is essentially a simple pump. Each shovel: - connects to the source broker and the destination broker, - consumes messages from the queue, - re-publishes each message to the destination broker (using, by default, the original exchange name and routing_key when applicable). With the addition of shovelling, 1. If the private cloud website (in this case the shipping dept dashboard) goes offline, customers can still continue to place orders which will reside on the public website RabbitMQ instance until being shovelled to the private cloud queue. 2. Similarly, when the internet website goes offline, customers will receive shipping notifications from the server whenever the service is back up. 3. Loose coupling between network layers. 4. Each RabbitMQ server can be upscaled and downscaled separately according to load. 5. Shovelling can be secured using TLS, thus becoming a secure exchange between two network. In addition to shovelling, RabbitMQ also supports Clustering and Federation (both exchanges and queues) for developers to construct a more robust, distributed, and highly available RabbitMQ setup. Clustering As the name suggests, clustering is grouping multiple RabbitMQ servers to share resources and hence increase the processing power of the node. All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. Federation The federation plugin allows you to make exchanges and queues federated. A federated exchange or queue can receive messages from one or more upstreams (remote exchanges and queues on other brokers). To put it simply, the federation plugin allows transmission of messages in between queues without requiring clustering. This can give developers a more granular control over the performance of the system, as they can decide how to federate a high load queue. Ina nutshell, adding a messaging layer to application architecture has the following high level advantages: Increased Reliability By decoupling different components by adding messaging layer to the application architecture, system architects can create a more fault tolerant system. Should one part of the system go offline, the other can still continue to interact with the queue and pick up where the system went off. The queue itself can also be mirrored for even more reliability. Better Performance Queues add asynchronicity to the application and allows for offsetting heavy processing jobs to be done without blocking the user experience. Consumers process messages only when they are available. No component in the system is ever stalled waiting for another, optimising data flow. Better Scalability Based on the active load, different components of the message queue can be scaled accordingly, giving developers a better handle on overall system operating cost and performance. Developing (or Migrating to) Microservices Microservices integration patterns that are based on events and asynchronous messaging optimise scalability and resiliency. Message queues are used to coordinate multiple microservices, notify microservices of data changes, or as an event firehose to process IoT, social, and real-time data. Although the immediate disadvantage to migrating to a messaging architecture would be the loss of a synchronous request response mechanism because messaging introduces a slight delay in the publish/subscribe process, carefully planning and evaluating the needs of the system can help achieve the fine balance between the two. References 1. Official RabbitMQ documentation 2. https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka

Overcoming Common Hu...

Not very long ago, I was working on a Spring Boot project, where we were automating an insurance claims submission process. The client had given certain non-functional requirements for retrying certain tasks and recovery scenarios. To visualise the whole story, I drew the business flow using BPMN notation on a piece of paper. And then it hit me… why don’t I use a process engine? I have had used two process engines before: Activiti and Appway. As Activiti was open-sourced and built on Java, it was my natural choice. However, I faced a roadblock when I found out that the latest native Activiti was not compatible with Spring Boot 2.x. Thankfully, I found out about Flowable! (www.flowable.org) In this blog post, I will list out some of the problems I faced while getting started with Flowable and present some of the solutions I used to navigate these challenges. Problem 1 — How To Model The Business Flow Using Flowable Flowable has its own web app that can be used to model the BPMN flow. You can download the war files from https://www.flowable.org/downloads.html. At the time of writing, Flowable 6.4.1 is the latest. Once you have downloaded and unzipped the folder, there will be a folder called “wars”. Run “flowable-idm.war”, followed by “flowable-modeler.war” using the command “java -jar”. Once both the applications are running, simply browse to http://localhost:8888/flowable-modeler/#/processes in your browser. You can log in to the application using “admin” as the user name and “test” as the password. Problem 2 — How To Restrict Flowable From Creating Tables Every Time Our Application Boots Up This was actually not very complex as the answer was in the flowable documents. Flowable uses a property called flowable.database-schema-update to determine what it needs to do with the tables during boot up. By default, its value is true. When this property is true, flowable will detect if the tables are already created and if so, will not run the SQL. So for our project, I just had to figure out what SQL scripts were running in Flowable. Once I knew that, I just had to create Liquibase scripts that can be managed by our application. The following image shows the path of the SQL scripts (highlighted in green) used by Flowable. Problem 3 — Autowiring Was Not Working In Service Tasks After I finished modelling my flow in Flowable, it was time to wire the service tasks added in the model with the actual JavaDelegate classes of the application. I first tried giving the fully qualified class name in the Service Task class property. The class TransferToSftpTask implemented Flowable JavaDelegate interface. The problem with this approach was that any class that I was auto wiring was not getting instantiated and hence, my business logic was failing. After some research, I found that rather than using the class property, I had to use the delegateExpression property and point it to the bean in Spring Application context that implemented the JaveDelegate. Since delegate expression points to the bean, we need to annotate the TransferToSftpTask as a Component so that Spring creates and injects the bean to the application context. Problem 4 — The Flowable Process Was Not Starting As part of my business flow, I had to copy some files from an S3 bucket to an SFTP folder. I thought of using Spring Integration to achieve this. As soon as I brought in the Spring Integration dependancy, I discovered that my process was not triggering. It’s important to mention that all my service tasks were marked as asynchronous. When we have Flowable and spring-boot-integration-starter in our application, both will try to create 2 beans of TaskExecutor interface. Spring-integration creates a bean of ThreadPoolTaskScheduler, while Flowable tries to create a bean springAsyncTaskExecutor of type TaskExecutor if there are no beans of that type available. Spring boot WebMvc tries to auto-configure a ThreadPoolTaskExecutor, a type of TaskExecutor. When you don’t have the spring integration, Flowable uses this ThreadPoolTaskExecutor. When you have the spring integration, it gets confused as there are two beans of the same type and thus, tries to resolve the dependency by determining if one is primary or has a higher priority. Hence, when we try to start a process and it does not find any, it fails. Spring boot WebMvc injects this TaskExecutor through TaskExecutionAutoConfiguration. Thus, all we needed to do was exclude this TaskExecutionAutoConfiguration from our Spring Boot Auto Configuration. Problem 5 — Configuring Retry Mechanism In its default configuration, Flowable reruns a job three times if there’s an exception in the job execution. On the other hand, our clients wanted the number of retries and the delay between two retries reconfigured. This can be achieved in Flowable by auto wiring ProcessEngineConfiguration. And then by introducing the following two lines. The method setAsyncFailedJobWaitTime sets the delay in seconds, while setAsyncExecutorNumberOfRetries sets the number of retries. The only thing that we should keep in mind here is that the above code will set the configuration as a whole. If we need more granular-level configuration such as for a specific task, we need a different retry or delay value that can be achieved by adding the following tag in the process XML under the service task tag. <extensionElements> <flowable:failedJobRetryTimeCycle>R5/PT7M</flowable:failedJobRetryTimeCycle> </extensionElements> Time cycle expression follows the ISO 8601 standard, just like timer event expressions. The example above makes the job executor retry the job 5 times and wait 7 minutes between before each retry. This expression cannot be added through modeller yet, so after the process is modelled and downloaded as XML, the above expression needs to be manually added. Problem 6 — Only Retry The Task That Failed Our business flow was quite complex. It had two subprocesses running in parallel. Each sub-process has a series of tasks. To keep things simple, let us consider a business flow that has 3 services tasks in sequence. If there was an exception on the third task, Flowable would retry it again from the first task. Though this behaviour is okay for some business flows, this was a problem for the one we had. To resolve this problem, we had to mark each task as asynchronous in the modeller. When a task is marked as asynchronous, Flowable marks them as separate jobs. Thus, when an asynchronous task is executed, Flowable will commit the transaction and the job-related progress will be visible by other components in the database. So now if an exception is thrown in the second task, while the transaction associated with the first task has been committed, the transaction associated with the second task would be rolled back. When the tasks are not marked asynchronous, the same transaction will be associated with all tasks. Thus, an exception in the second task will also rollback the changes done by the first task. After marking the tasks as asynchronous, the retry mechanism only triggered the task where the exception was thrown and not any before that. Problem 7 — Logging Error-Related Information In The Database This is an extension of the earlier problem I shared. However, once you understand how Flowable manages the transaction, the solution becomes clear. When an exception is thrown in one of the service tasks, we wanted to log an error in our database. But since Flowable was rolling back the transaction, our database save was not persisted. The solution to this problem was simple. We called one more service to persist the data using a REQUIRES_NEW transaction type. This will create a new transaction and suspend the outer transaction. When this method is exited, the new transaction is committed. So now if we throw an exception at the task level, the outer transaction rolls back, while the error log gets persisted in the database. Well, that’s it for my blog post! If you’re considering giving Flowable a try or are experiencing similar challenges, I hope this will give you a better understanding of how Flowable works.

Functional Programmi...

I’ve been exploring functional programming with Scala and its eco system for the past few months. In this post, I’ll highlight some of the features of the language that enable intuitive creation of functional code for distributed systems and data operations. Photo by apoorv mittal on Unsplash Higher Order Functions As per the official documentation, Functions are first class objects in Scala, which means that they can: Take another function as an argument, or … Return a function An example of a function taking another function as an argument is the map() function in Scala's standard collections library. val examplelist: List[Int] = List(2,9,8,14) examplelist.map(x=> x * 2) // anonymous function as argument When working with standard Scala collections, it’s also very intuitive to chain operators, especially with the infix notation. In the small code example below, I’m defining a list of numbers from 1 to 20, filtering on even numbers and then summing them up. (1 to 20).toList filter (_%2 == 0) reduce (_ + _) The _ is the wildcard operator - in the case of maps and filters, it refers to the value in the collection. Recursion The recommended way to do operations on all the items in a collection is to use the operators map, flatMap, or reduce. In case those operators don’t meet a use case’s requirements, it’s very useful to write a tail-recursive function to operate on all the items in a collection. The code example below shows a tail-recursive function definition to compute the factorial of a number. import scala.annotation.tailrec @tailrec // Factorial Function Implementation that uses Tail Recursion def factorial(in_x: Double, prodsofar: Double = 1.0): Double = { if (in_x==0) prodsofar else factorial(in_x-1, prodsofar*in_x) } factorial(5) In Scala, a tail-recursive function as shown above can be optimised by the compiler (using the @tailrec annotation above) to occupy just 1 stack frame, so there's no chance of a stackoverflow error even for many levels of recursion. This is possible out-of-the-box, without any need for frameworks or plugins. As mentioned above, the recommended way is to use to collections operators (such as reduce, etc.). As a demo of the ease of use of the collection's APIs, the above factorial function can also be implemented by the 1-liner below: (1 to 5).toList reduce (_*_) To conceptually understand reduce, check out this great link! (Also do check out the explanations of foldLeft, foldRight, map, flatMap to understand some commonly used data operations!) Photo by chuttersnap on Unsplash Case Classes Case classes can be instantiated very easily with no boiler plate code, such as the example below. case class BusinessTransaction( sourceaccountid: Long, targetaccountid: Long, amount: Long ) // create some transactions now to demo case classes // I lend my friend val 1_xaction = BusinessTransaction(112333L, 998882L, 20L) // My friend pays me back val 2_xaction = BusinessTransaction(998882L, 112333L, 20L) Just 1 case class .. line above does the following useful things: Defines the 3 immutable values sourceaccountid, targetaccountid and amount Defines get methods to access the constructor arguments (eg: 1_xaction.amount) While the ease of use is great, case classes are the recommended way to store immutable data instances in Scala. For example, in a big data application, each line of a large datafile can be modelled by a case class and stored. An example of the use of a case class to store data is here. In the linked example, the function rawPostings models each line of the datafile as an instance of case class Posting. It will eventually return a dataset of type RDD[Posting]. Pattern Matching In Scala, objects such as case classes, regular classes, and collections can be decomposed through pattern matching. Essentially, this means that you can use pattern matching to: Decompose an object’s type (example below) Get the head of a collection (such as a List or a Seq) The code example below shows how to use pattern matching to decompose a Seq. val seq1: Seq[Int] = Seq(1,3,4,5,5) seq1 match { case x::y => println(s"The first element in the sequence is ${x}") case Nil => println("The sequence is empty") } The cons operator (::) creates a list made of the head ( x) and the rest of the list (called the tail, y). Photo by Samuel Zeller on Unsplash Companion Objects In OOP, a static variable is sometimes used in a class to store state or property across multiple instantiated objects. However, there is no static keyword in Scala. Instead, what we use are Companion Objects aka Singleton Objects. A Companion Object is defined using the object keyword and has the exact same name as its accompanying class. The companion objects can define immutable values, which can then be referenced by methods in the class. There are 2 common patterns to use companion objects in Scala: As a factory method To provide functionality that is common to the class (i.e. static function in Java) // The 'val specie' straightaway defines an immmutable class parameter abstract class Animal(val specie: String) { import Animal._ // Common Behaviour to be mixed-in to Canine/Feline classes def getConnectionParameters: String = Animal.connectionParameter } object Animal { // .apply() is the factory method def apply(specie: String): Animal = specie match { case "dog" => new Canine(specie) case "cat" => new Feline(specie) } val connectionParameter:String = System.getProperty("user.dir") } class Canine(override val specie: String) extends Animal(specie) { override def toString: String = s"Canine of specie ${specie}" } class Feline(override val specie: String) extends Animal(specie) { override def toString: String = s"Feline of specie ${specie}" } // syntactic sugar, where we don't have to say new Animal val doggy = Animal("dog") val kitty = Animal("cat") doggy.getConnectionParameters Options Most application code checks for Null/None types. Null types are handled a little differently in Scala — the construct used is called an Option. This is best demonstrated with an example. val customermap: Map[Int, String] = Map( 11-> "CustomerA", 22->"CustomerB", 33->"CustomerC" ) customermap.get(11) // Map's get() returns an Option[String] customermap.get(11).get // Option's get returns the String customermap.get(999).get // Will throw a NoSuchElementException customermap.get(999).getOrElse(0) // Will return a 0 instead of throwing an exception In a language like Python, if None: checks would be quite common throughout the codebase. In Java, there would be try-catch blocks that would handle thrown exceptions. Option s allow for focusing on the logic flow with minimal diversions for type or exception checks. A standard way of using Options in Scala is for your custom functions to return Option[String] (or Int, Long, etc.). Let's look at the Map structure's get() function signature: def get(key: A): Option[B] One (intuitive) way to use this is to chain it with the getOrElse() function as shown below: // Map of IDs vs Names val customermap: Map[Int, String] = Map( 11-> "CustomerA", 22->"CustomerB", 33->"CustomerC" ) customermap.get(11).getOrElse("No customer found for the provided ID") A very useful way of using Options is together with a collection operator like flatMap that directly handles the types for you transparently. // Map of IDs vs Names val customermap: Map[Int, String] = Map( 11-> "CustomerA", 22->"CustomerB", 33->"CustomerC" ) val listofids: List[Int] = List(11,22,33,99) listofids flatMap (id=> customermap.get(id)) //flatMap magic And that’s it from me! My next excursion is to explore concurrent systems with Akka and the Actor model. Look out for a future post, where I’ll share my learnings on that topic (and it’s relationship to Scala’s approach to functional programming). Originally published at http://github.com.

Speed Up Your JavaSc...

For a recent personal project, I had only needed a fairly simple node.js server to do exponential and costly computing tasks. To be honest, I could have switched the entire tech stack, but I estimated that the development time of such a choice wasn’t worth it… Still, I had some functions taking ages to compute. So I had a look around, and decided to let that task be handled by a more appropriate language, in this case Rust. This choice made me dedicate the good task to the good language, as my other-than-that simple server just had to handle routes and calls, which node does with ease, and dedicate all the tedious calculus to Rust. What is Rust? Rust is a low level, safety-focused language designed by Mozilla and has topped the StackOverflow Developer Survey as the most loved programming language for four years in a row. It runs blazingly fast when compared to most other languages, often even slightly faster than C itself. It’s neither a functional nor an object oriented language, and its syntax is close to C++. Its npm equivalent is called Cargo, and the packages are named crates. How Do You Mix Rust with NodeJs? Fortunately for me, I wasn’t the first person who wished to mix Rust with NodeJs. This has been handled by far more talented people, through what is called a Foreign Function Interface (FFI) and Dynamic Libraries (these files that end in .dylib for example). This allows a program that runs in one language (the host language) to call functions written in another, (the guest language), just like what we’d call a host language library. And inside the guest languages functions, you can have access to whichever useful guest languages in any third party-library ! So How Do We Write It? Let’s get started with the basics then. First we will need Rust and Cargo: curl https://sh.rustup.rs -sSf | sh Once we’re done, we can create a new project, in this case a library: cargo new --lib <libraryname> This set ups a new directory with a src folder, and a Cargo.toml (which is the equivalent of the package.json). Now that we’re set up, let’s write our code: For this example, in order to keep it simple yet explicit, we’ll just create a recursive Fibonacci number function. It is very simple, commonly used for benchmarking (it runs as O(2^n)), and makes use of tail-recursion, which is quite a limitation for JavaScript (even more on the front-end as very few browsers support it). So let’s open src/lib.rs and write our function: fn fibonacci(n: u64) -> u64 { if n <= 2 { return 1 } return fibonacci(n-1) + fibonacci(n-2); } #[no_mangle] pub extern "C" fn fibonacci(n: u64) -> u64 { if n <= 2 { return 1 } return fibonacci(n-1) + fibonacci(n-2); } The first function would be how to write it if it was destined for the same Rust program. However, we are building a dynamic library so we need to make a few changes that I am going to review: #[no_mangle] This line is a macro: it gives instructions in order to modify the program at compile-time. In this case, we prevent the compiler from changing the name of the function through name mangling. In short, name mangling is the way your compiler renames functions in order to make sure they call the correct one (differentiate List.get() from Array.get() for example). The output often looks like _Z4xfunction1HgF7jffibonacci for example. But this would be the name we have to call from node, so we want to keep it simple. Then we have pub. This means that the function is publicly available to use, so we can call it from outside this module. Finally, the extern "C". This indicates that we are using the C ABI(Application Binary interface). In our case, we can remove this, as Rust uses the C ABI by default. We can use it to specify if we're targeting other foreign API calling conventions, such as Windows API. We can then compile and try it within a small node app. We’ll do this with the --release flag as we want Rust to optimise the binary (without instruction, Cargo will build it in debug mode which can be surprisingly slow). cargo build --release This will create a lib.dylib in ./target In a node js app, let’s try to call our function. For this, we will need node-ffi : npm i ffi In our code, let’s import our dynamic library. This will result in a var, similarly to a require(); After giving the path to the library, we also need to indicate the functions we wish to import from this library, specifying the return type and the parameters they take. var lib = ffi.Library(path.join(__dirname, './target/release/libdemo-rust-node.dylib'), { fibonacci: ['int', ['int']], killorface: ['int', ['int']] }); A function like pow taking a double and an integer, and returning a double would be imported like this : pow: [ 'double', [ 'double', 'int' ] ] We’ll declare an equivalent function in js and call both of them with a console.time to benchmark them: var ffi = require('ffi'); var path = require('path') var lib = ffi.Library(path.join(__dirname, './target/release/libdemo-rust-node.dylib'), { fibonacci: ['int', ['int']], }); function fibonacci(n) { if (n <= 2) { return 1; } return fibonacci(n - 1) + fibonacci(n - 2); } console.time() var rustFibonacci = lib.fibonacci(30); console.timeEnd() console.time() var nodeFibonacci = fibonacci(30); console.timeEnd() console.log(rustFibonacci, nodeFibonacci) Let’s run it : user$ node index.js default: 2.850ms default: 10.805ms 832040 832040 As we can see, both returned the same result. However, we can see a noticeable difference in computing time. Keep in mind, however, that this microbenchmark does not account for the loading time of the library. Restrictions There are still some restrictions to using FFIs. First, keep in mind that an FFI call is very costly… as stated in the readme of ffi-node for example : There is non-trivial overhead associated with FFI calls. Comparing a hard-coded binding version of strtoul() to an FFI version of strtoul() shows that the native hard-coded binding is orders of magnitude faster. So don't just use the C version of a function just because it's faster. There's a significant cost in FFI calls, so make them worth it. If you’re loading a dynamic library, this also comes at a cost, and you might not reach the presumed performance. Also, if you’re only looking after low-level and optimised code, the best you can make do is to load the library only once for many uses, otherwise a C extension would be better. Precisions: This example being trivial, it only uses simple integer types in the functions. If you’re looking to work with JavaScript objects and types directly in Rust, have a look at Neon. More importantly, you can’t make code running in a browser handle such calls… How About Front-End Then ? You might have heard about WebAssembly (wasm). It is a stack machine aiming to execute at native speed C/C++ or other fast languages in JavaScript. In fact, in a very short time, it implements what we did previously at a higher level, and using cross-languages standards. Rust makes the wasm module build a core of itself. You can write and publish your npm module in Rust. You can also install it and run it through most modules bundler, though the popular WebPack is the most documented of them all. Let’s have a quick tour on how to proceed with the previous example: First we install wasm-pack in order to compile and produce a npm package from our code. $ cargo install wasm-pack Also, in order to publish our package, we’ll assume you have an npm accountalready set up. Then, let’s create the project: $ cargo new --lib wasm-fibo Created library `wasm-fibo` project In the Cargo.toml, we need to add a few things: [lib] crate-type = ["cdylib"] [dependencies] wasm-bindgen = "0.2" In the newly generated src/lib.rs, let’s write our function: extern crate wasm_bindgen; use wasm_bindgen::prelude::*; #[wasm_bindgen] pub fn fibonacci(n: i32) -> i32 { if n < 2 { return 1 } else { return fibonacci(n - 1) + fibonacci( n - 2) } } We are using wasm_bindgen, a bridge between Rust and JavaScript. It will take care of the mangling problem, but we still need to state our function as publicly available with the pub keyword in front. Let’s build our package now: $ wasm-pack build [--scope <mynpmusername>] --release That creates a pkg directory at the root with a lot of different files in it. They bundle everything we need to know about your package, which function it exports, what are the arguments they require, and what types to return. Now, let’s publish it and use it: $ cd ./pkg && npm publish Now, in our webpack application, we’ll just have to install it through npm: $ npm i [@<mynpmusername>]/wasm_fibo (If you published it with your username in scope, you will need to carry it in all imports.) Done! Now we can use it as any other npm package, here with the es6 syntax: import { fibonacci } from "wasm-fibo"; console.log('This is wasmfresult : ', fibonacci(23)); To Conclude FFIs or WebAssembly are two practical solutions for faster processing without having a huge payback in development time and comfort, giving you more time to develop in a higher level language, while still having the right tool handling the right work and giving you access to a library that doesn’t exist in your host language. Between the two, the nuances can be subtle. In short, WebAssembly is platform agnostic: it runs in browsers, servers, inside PHP, anywhere. WebAssembly exports symbols, like functions, or memories. You can call those functions from the outside. Other than running environment, wasm module makes it possible to interact with JavaScript functions such as the console or alert() easily, but it also has some limitations, including those of the bundler and the browser you use if run in front-end. Most of the time, if not carefully designed, the run time performance gain is very small for a single call and not as fast as FFI calls (outside loading). In both cases, there is a payoff on calling external functions. A single call to a very “heavy” function will often be worth with a FFI, whereas the WebAssembly solutions will pay off from at least a moderate amount of calls.

How I Get Projects G...

In this article, I’ll be demonstrating my workflow to get started on a project — by setting up an AWS EMR Cluster using a Cloudformation template. I’ll first introduce both the Spark app and the Cloudformation template I’ll be using. I’ll then deploy my demonstration Spark app’s Assembly Jar to an S3 bucket, before running the app on the EMR cluster. Finally, I’ll query the Hive external table created on the cluster using Hue. As a managed service, I find EMR to be a great option to spin up a cluster to get started right away — and it provides powerful computing functionality, as well as flexible storage options. Before I go on, I’ll list down the dependencies that I need before I can continue —  * AWS account with root access (preferably) * IDE and Unix Terminal * The AWS EMR Pricing guide! — to determine the most cost-effective region to host my EMR cluster. About The Spark App For this blog post, I’ll use my simple demonstration app hosted on Github at simple-spark-project, which will - * Read in an apache web server log file into an RDD[String]structure * Define a schema, and convert the above RDD[String] into a DataFrame * Create temp view and then an External Hive Table to be stored in an S3 bucket The app takes in 3 input parameters - * S3 URL of the apache web server log file * S3 URL of the output S3 bucket * The spark.master configuration (which will be ‘local’ in this example) The build configuration file in the project defines the libraryDependenciesand the assemblyMergeStrategy to build the Assembly/Uber Jar — which is what will be executed on the EMR cluster. Build Infrastructure Using Cloudformation The Cloudformation template I’ll use defines and configures - * IAM Roles to deploy and read to/from S3 buckets * IAM Users to read to/from S3 buckets * 3 x S3 buckets to host the Assembly Jar, Hive external table and the EMR Cluster log * The EMR Cluster Fig. 1 — Cloudformation template To create the Stack, I’ll follow these steps - 1. Navigate to Services > Compute > EC2 > Key pairs and create a Key Pair. Download the .pem file. 2. Navigate to the Cloudformation service from the AWS console 3. Open Cloudformation Designer (via the button ‘Design template’) 4. Open the template 5. Validate and Create the Stack The template will output the KeyId and the SecretKey of the newly created IAM users, who will be named DataArchUser-* and DataArchAdmin-*. Create Steps On The EMR Cluster To create a Step on the cluster, I’ll navigate to Services > EMR > Clusters and add a Spark application step in the ‘Steps’ tab of my cluster. Fig. 2 — Step configuration to add Spark application Check Output Hive External Table Once the Spark app is completed, I can query the final Hive external table in Hue using HiveQL. Fig. 3 — Querying the external table in Hive And that’s it! I now have a working cluster that I can now use to develop and run more complex applications. This isn’t a production-grade cluster, but it is one you can quickly spin up to begin work on a new project.

Pair-Programming: A ...

In the blog post Pair Programming: A Developer’s Perspective, Mingwei helpfully teased out the benefits and pitfalls of the practice of Pair Programming. Typically, Pair Programming is viewed as the Development Team’s concern -the kind of stuff only techies would and should care about. So, if you happen to be the Scrum Master and your team members are advocating this, you’ll have your work cut out for you. But still work that is worth your time, nonetheless. So here are a Scrum Master’s perspectives on Pair Programming. Who Are You and How Are You Involved ? You are the Scrum Master. The servant-leader who coaches the Development Team in self-organisation. You are adamant Pair Programming is good for delivery in the long run. But if the team is not considering this, you worry that enforcing this practice will interfere with their autonomy. After all, you have the Scrum rules carved into your heart. Remember that bit about “No one (not even the Scrum Master) tells the Development Team how to turn Product Backlog into increments of potentially releasable functionality?” To push for Pair Programming is similar to telling them just that. You fear breaking the Scrum rules. Your conscience is killing you. All is not lost, though. There is another point we can make about a Scrum Master’s service to the organisation, namely “causing change that increases the productivity of the Scrum Team”. If you are convinced Pair Programming increases productivity, it would be good to first identify what concrete productivity problems are happening, and connect that to how Pair Programming can help, while keeping in mind our principle is to be as lightweight as possible in our process. It is not about mindlessly implementing every promoted idea and selling them as “good practice.” “If there is no problem to be solved, then there is no point implementing a solution.” Pair Programming is a solution to problems. The one over-arching smell which Pair Programming might help (note the word “might” because problems could run deeper) is that of a silo’ed knowledge of the domain/system/code within the team. This smell manifests itself in symptoms like: bottleneck in team due to high inter-dependency between team members uneven spread of work within the team in an iteration/sprint fear of changing unfamiliar parts of the system drop in team’s engineering practices over time inability of some team members to estimate a piece of work lack of confidence in supporting a production system low bus factor in the team (also known as panic in the team when a critical team member goes on a really long vacation or worst — “you better be on call while holidaying in a beautiful remote island.” ) Circling back to the Scrum Master’s role, it is true that you can’t really enforce anything else apart from the Scrum rules. But there is nothing stopping you from influencing and encouraging the Development Team’s experimentation with Pair Programming in order to be more productive, especially when you are able to connect Pair Programming as a solution to productivity-impeding causes. Even if you fail, continue to listen to understand their concerns and objections. Give the Development Team time to digest and try again some time later. It’s a huge weight off your chest if you find yourself in the enviable position of having team members who are keen on trying Pair Programming. After all, this is an initiative from the Development Team. But don’t assume this practice will be welcomed into the team’s way of working. Uh Oh .. There Are Concerns From The Sponsor and Product Owner It is normal that there will be concerns or objections pertaining to the cost and impracticality of Pair Programming. Many would question why there is a need to pay two people to do the same job when one can do it “perfectly fine.” Of course, the key assumption here is the idea of “perfectly fine”. It is good to explore what this means. If it means delivering the project or product with the symptomatic smells as mentioned earlier, and accepting all the risks (technical, staffing, maintenance, knowledge) that it brings, then there really isn’t a reason for Pair Programming. This assumption is based on attempting to locally optimise the capability of each individual for the short term rather than the collective whole for the long term. To be able to see Pair Programming in a positive light, a long term view is needed. This practice reduces staffing risk on the project or product. Although people might get sick or quit, this knowledge is already well shared within the Development Team. It increases the overall technical skill as experienced developers can do hands-on mentoring for junior developers. It also speeds up onboarding when there is a need to add more people to the team (with a caveat¹). In my experience, I have never encountered a case where the delivery speed dropped because a new person joined a team that does Pair Programming. In addition, the practice of Pair Programming helps to combat the unhealthy software development smells I mentioned earlier, which in turn, benefits the project or product development overall. If you are interested in a researcher’s perspective on this topic, check out Laurie Williams and Alistair Cockburn’s The Costs and Benefits of Pair Programming. Journey to the New World Adopting the perspective of the Satir Change Model, the practice of Pair Programming is a foreign element that is injected to the team’s Old Status Quo. You can almost predict that they will go through some stage of resistance and chaos. Strong, determined and well-gelled teams will probably be able to claw their way from chaos to the New Status Quo through relentlessly integrating the practice into the team’s life. In this case, there is less facilitation support you need to provide. However, the typical situation will involve a bit more help from the Scrum Master. Do not underestimate the adjustment needed for team members to transform to the New Status Quo, where Pair Programming becomes natural to the team’s practice. From Chaos to the New Norm Different people in the team have different Change Quotients² , or the ability to adapt to changes. Each person also brings their own expectations into how Pair Programming will be practiced in the team. These expectations are often assumed and creates friction whenever they differ. You can help the team to get on the same wavelength by facilitating their working agreement when it comes to Pair Programming. Focus on what values and behaviour they agree to bring into the pairing session. Consider having a discussion around how the embodiment of the 5 values of Scrum will help. Exploring Extreme Programming’s values of Communication and Feedback can be useful as well. Look into : how disagreements should be resolved (this is inevitable when you have more than 1 brain thinking) permission to raise uncomfortable issues between a pair: Hygiene issues (e.g. sharing of keyboard/mouse, bad breath or body odour, pairing with a sick person) and sometimes, perceived attitude or attention problems On the more practical side of things, have a starting agreement on the following areas (the team can always change it after trying it out): When to pair and when not to pair (a recommended minimum would be Pair Programming is a must on writing production code, while not mandatory for other activities such as experimenting or researching) How long to pair program before taking a break (Pair Programming will be more mentally taxing, and taking a short break will help to keep the pair productive). Try the Pomodoro Technique and see if it helps. Experiment with taking a break when the pair has finished integrating the code and has kicked off the automated build. The team can have other creative approaches. How often to swap pairs (a pair that sticks together for a long time may create a new silo knowledge, which is exactly what we are trying to avoid with Pair Programming). See if the team can adapt Promiscuous Pairing. Or perhaps a logical point to swap pairs (e.g. after a story is done), or even a time-based limit approach (every day, every half a day, every 2 hours, etc., but definitely not as long as every sprint). Try different ways to determine what is helpful. In the early days of Extreme Programming, there were even experiments that involved swapping pairs every 5 minutes. Extreme indeed. Core hours of pairing: Let’s face it — coding is not the only thing developers do. They need to reply to emails, spend time in backlog refinement, update time sheets, participate in daily scrums, and attend the occasional long-lunch appointments. In some cases, people also simply need some alone time ( This is Pair Programming, not Siamese-twin Programming). Setting a core hour of pairing helps each individual to better plan for the day. This also means that the core pairing hour cannot be exactly the same as working hours. It needs to be shorter so developers can do other things. If the idea of Pair Programming for most of the day is too high a bar, try experimenting with half-day pairing. Handling of interruptions and distractions: Unless you work in some sort of programmer’s paradise, you will never have the chance to do a full day of coding. Interruptions will happen. If core hours of pairing are established, developers can better plan meetings to happen outside of core hours to minimise interruptions. It is worth mentioning that the use of mobile devices can be particularly distracting to pairing (incoming calls and messages, social media updates, etc.). I have seen how the sudden popularity of smartphones in 2008 had impacted a good team’s Pair Programming practice. Unless there are urgent calls, the recommended agreement is not to have them around during pairing. After all, a healthy pair will take frequent breaks throughout the core pairing hours and they will have a chance to check their phones. Consider skipping the formal code review: If the team has an existing practice of code review in place or uses Git Flow’s Pull Requests, consider relaxing the requirement on codes that are pair programmed. Pair Programming produces a better form of code because code is reviewed on the fly. If one is unable to trust a pair’s work on the code base, you have a bigger issue that needs to be addressed. Sickness: If a person is sick, encourage the person to stay at home (if they must be in the office, then they should not pair up). Because knowledge is already shared through pairing, the team can operate well without the sick person around. Catching bugs would not be helpful (they have enough bugs to solve on their own, thank you very much!) On navigating the implementation of Pair Programming between two persons of varying background (e.g. experience/novice pairing, introvert/extrovert pairing, etc.), I highly recommend reading Laurie William’s Pair Programming Illuminated . Walk a Mile in Their Shoes As a Scrum Master, you can join them in feeling that unfamiliar change. Offer to pair with a developer. You do not need to pair with them for a full day. Pair for 30 mins. Pair for an hour. Switch and pair with another person. Reflect on your experiences. In what ways did you feel uneasy? In what ways did your pair feel uneasy? What did you notice about the environment or the space where you were paired ? “But I don’t know programming,” you say. Not a problem. Offer to be the Rubber Duck³ for a developer that is stuck. Helping the Team to Perform Pair programming can take some time to get used to. Like months. It requires a different mindset. The team will either adopt, reject or assimilate. Teams may fall back and return to the old status quo. Encourage teams to continue ironing out the issues. Run retrospectives to find out what issues they are facing, and what changes to make to improve their experience. Consider comparing development metrics before and after Pair Programming (defects count, bus factor, estimation variation, time spent on stories, code cyclomatic complexity, automated test coverage, and velocity vs number of team member availability) to see the improvement. This serves as tangible evidence to encourage the team to move forward. What does “performing” mean in Pair Programming? It means it is the default mode they will choose when working, even when under pressure. When that happens, the team has reached fluency in this practice. To borrow from the Agile Fluency Model’s view of fluency, it is “a habit of exhibiting the proficiency at all times, even when under pressure.” When you see that team members pair program by default (even under pressure), swap pairs often, naturally ask for pairing without hesitation when they are stuck or are working on a critical piece of work, you know that the team has reached the New Status Quo. Pair Programming is now the new norm. What’s Next ? Pair Programming can be a powerful catalyst to spread knowledge and skill in the team. Engineering practices like Test Driven Development and refactoring skills can be diffused within the team during pairing. This reinvigorates the health of the project or the development of a product on hand. It is also a powerful way to bring new team members up to speed with the least amount of disruption to their work. However, just like a plant, the team needs to nurture and protect this practice continuously. One way it needs to be protected is when the team composition changes. It is important to set the expectations for new team members when working in such an arrangement. A good way to have an alignment is to add Pair Programming into the interview process. This allows the interviewer to assess the candidate’s skill and openness to collaboration, while the candidate can get a feel of how the potential new environment is going to be. It helps them to make a well-informed decision before joining the team. Conclusion When it comes to Pair Programming in a team, there is more to gain than lose. However, the benefits do not come cheap. Investments are necessary. There will be the initial cost of time when the team moves towards the new status quo. Teams and their sponsors need to look at how they operate and decide if they need these benefits. In most cases, it will be needed. It is in the stakeholders’ best interests to weigh the pros and cons, and recognise what they are giving up and the risks they are creating if they decide not to invest in this practice. This does not magically nullify Fred Brooke’s Mythical Man Month law of “adding manpower to a late software project makes it later” From Change Artistry (by Esther Derby, Gerald M. Weinberg, Johanna Rothman and Don Gray), Change Quotient “relates to how open a person is to change.” Rubber Ducking : Sometimes the brain thinks too fast for its own good. Details may be skipped and assumptions are hidden. By explaining a problem verbally to a Rubber Duck (or in this case, you), the person is forced to slow down (you can only think as fast as you speak). By verbalizing it, the thought process is made explicit and assumptions are surfaced. This helps to bring to light more understanding about the problem. And the best thing as a Rubber Duck is, you do not need to say anything ! Just lend a listening ear. What an easy way to help remove an impediment :)

The Weird Parts Of J...

My first experience with code was during National Service when I purchased a book on Python and wrote my first “Hello World!” file. Since then, I have progressed to pick up other programming languages such as JavaScript, Ruby and Java. The following is a summary of the quirks and odd features of JavaScript that I have came across in the last year. The typeof operator The typeof operator returns a string stating the type of the input value. typeof 'hello world!'; // 'string' typeof 123; // 'number' typeof undefined; // 'undefined' typeof true; // 'boolean' typeof { a: 1, b: 2, c: 3 }; // 'object' function sayHello() { console.log('Hello world'); } typeof sayHello; // 'function' There is no ‘array’ type in JavaScript, use Array.isArray() to check for an array typeof [1, 2, 3, 4, 5]; // 'object' Array.isArray([1, 2, 3, 4, 5]); // true Performing mathematical operations on non-numbers results in NaN (not a number) Somehow NaN (not a number) is a number? const foo = 5 / 'hello'; // NaN typeof foo; // 'number' Ever heard the saying “Everything in JavaScript is an object”? typeof null; // 'object' More on NaN Trying to evaluate NaN to be equal to anything will result in false. 51 === NaN; // false 'hello' === NaN; // false ['this', 'is', 'an', 'array'] === NaN; // false null === NaN; // false Evaluating NaN with NaN results in false too. NaN == NaN; // false NaN === NaN; // false const notANumber = 'abc' - 100; // NaN notANumber === NaN; // false We can check for NaN using the built in isNaN() function. It converts the input value to type Number before returning true for a NaN value. const notANumber = 'abc' - 100; // NaN isNaN(notANumber); // true isNaN('hello world'); // true isNaN('12345'); // false - Number('12345') returns 12345 Implicit Coercion Explicit coercion is an obvious attempt from the author to convert a value of one type to another type. const str = '12345'; typeof str; // 'string' const num1 = parseInt(str); const num2 = Number(str); typeof num1; // 'number' typeof num2; // 'number' Implicit coercion can be unclear and may be an unintended side effect. Are these strings or numbers? const str = '12345'; typeof str; // 'string' // Using the + operator const plus = +str; typeof plus; // 'number' // String representation of a Number * 1 const times = str * 1; typeof plus; // 'number' Using !! to get a Boolean value of the input, indicating if it is truthy or falsy. const zero = 0; const one = 1; !!zero; // false !!one; // true const str = 'Hi this is a string.'; const emptyStr = ''; !!str; // true !!emptyStr; // false Here are some more examples of implicit coercion that gets confusing. !![]; // true +[]; // 0 +!+[]; // 1 !+[] + !+[]; // 2 [+!+[]] + [+[]]; // '10' [][[]]; // undefined +[![]]; // NaN typeof [] + []; // 'string' typeof +[]; // 'number' typeof ![]; // 'boolean' Scope & hoisting Variables declared with var are function scoped. function someFunction() { for (var i = 0; i < 5; i++) { console.log(`Inside the loop, i is ${i}`); } console.log(`Out of the loop, i is ${i}`); } someFunction(); // 'Inside the loop, i is 0' // 'Inside the loop, i is 1' // 'Inside the loop, i is 2' // 'Inside the loop, i is 3' // 'Inside the loop, i is 4' // 'Out of the loop, i is 5' Variables declared with let are block scoped. function anotherFunction() { for (let i = 0; i < 5; i++) { console.log(`Inside the loop, i is ${i}`); } console.log(`Out of the loop, i is ${i}`); } anotherFunction(); // 'Inside the loop, i is 0' // 'Inside the loop, i is 1' // 'Inside the loop, i is 2' // 'Inside the loop, i is 3' // 'Inside the loop, i is 4' // 'ReferenceError: i is not defined' Function declarations are hoisted to the top of the file, they can be called before they are declared. On the other hand, function expressions are not hoisted, they cannot be called before they are expressed. helloDeclaration(); // 'hello function declaration' // Declaration function helloDeclaration() { console.log('hello function declaration'); } helloExpression(); // TypeError: helloExpression is not a function // Expression const helloExpression = function() { console.log('hello function expression'); }; Variable declarations are hoisted to the top of the file but the assignment of values are left in place for runtime execution. a = 2; // this line (assignment) is left to be executed during runtime, it runs later var a; // this line (variable declaration) is hoisted to the top of the file, it runs first console.log(`a is ${a}`); // 'a is 2' Comparing the above snippet with this: console.log(`a is ${a}`); // 'a is undefined' var a = 2; // this statement is broken into two parts: var a; and a = 2; // var a (variable declaration) is hoisted to the top of the file, a = 2 (value assignment) is not hoisted Strict equality vs loose equality JavaScript has == and === to check for equality, as well as != and !== to check for non-equality. === is known as strict equality, it checks for both value and type equality. On the other hand, loose equality is represented by == and it only checks for value equality. Coercion is allowed for == and JavaScript will attempt to convert the values to a common type. const num = 123; const str = '123'; num == str; // true num === str; // false 1 == true; // true 1 === true; // false 0 == false; // true 0 === false; // false Comparing arrays and objects Arrays and objects are reference types. Comparing 2 different arrays/objects using == or === returns false as they point to different arrays/objects in memory. To compare the elements of an array and the key/value pairs of an object, a deep comparison has to be done. const arr1 = [1, 2, 3, 4, 5]; const arr2 = [1, 2, 3, 4, 5]; arr1 == arr2; // false arr1 === arr2; // false const obj1 = { a: 1, b: 2, c: 3 }; const obj2 = { a: 1, b: 2, c: 3 }; obj1 == obj2; // false obj1 === obj2; // false Infinity and -infinity Number​.POSITIVE_INFINITY is a numeric value representing infinity, it can also be written as Infinity. Number.NEGATIVE_INFINITY equates to negative infinity, it can also be written as -Infinity. Infinity + Infinity; // Infinity Infinity - Infinity; // NaN Infinity * Infinity; // Infinity Infinity / Infinity; // NaN -Infinity + -Infinity; // -Infinity -Infinity - -Infinity; // NaN -Infinity * -Infinity; // Infinity -Infinity / -Infinity; // NaN The built in Math object includes helpful methods such as Math.max() and Math.min(), they return the input maximum and minimum values, respectively. Math.max(1, 20, 300, 4000, 50000); // 50000 Math.min(-1, -20, -300, -4000, -50000); // -50000 What happens if no arguments are passed into Math.max() and Math.min()? Math.max(); // -Infinity Math.min(); // Infinity It has been a quirky yet enjoyable experience over the past year and I look forward to learning more JavaScript 🙃

A Simple Guide To Cy...

Hello everyone! I am a Test Automation Engineer trying to look at other options apart from conventional Selenium… and finally came across Cypress! This post is all about explaining my first-hand experience with Cypress, a test automation tool. Why Cypress? Let’s look at a common example to explain how it works. We will open a Google webpage in Firefox browser and check for the Google Search button. To understand Cypress Edge, we need to understand Selenium Architecture first. In a nutshell, this is the test automation process that happens with conventional Selenium. Selenium consists of two components. Bindings: Libraries for different programming languages that we use to write our tests with. WebDriver: This is a program that can manage and fully control a designated and specific browser. The important thing to note here is that these two components communicate over HTTP by exchanging JSON payload. This is well defined by WebDriver Protocol, which is a W3C Candidate Recommendation. Every command used in tests results in a JSON sent over the network. This network communication happens even if the tests are run locally. In this case, requests are sent to localhost where there is a loopback network interface. First, a specific driver for the browser is initialised (WebDriver is an interface and Firefox Driver is Class Implementing Interface). Once the corresponding WebDriver is initialised, JSON Wire Protocol is called by Implementing class. In our case, Firefox Driver and session is created to execute subsequent commands first. Subsequently, Web Element Button is created and for each each action we give in language binding, JSON Wire Protocol is called, which travels on a network over HTTP. Still Confusing? In short, for each line of Selenium code, JSON Wire Protocol is called and in turn, talks to the browser on network over HTTP methods (Get, Post, Put, Delete). Bottom Line? The Selenium architecture works through the network and this brings delay, which can sometimes be significant. Cypress addresses this issue by changing the process. Cypress has its own mechanism for manipulating DOM in the browser. Cypress runs directly in the browser, so there is no network communication involved. By running directly in the browser, Cypress has access to everything in the browser, including your application under test. Selenium vs Cypress Cypress — Automation Testing Framework Cypress is an automation testing tool built for modern web applications like React, Vue.js, Angular, etc. Cypress is a test automation tool but not based on Selenium. It is basically different from Selenium. Selenium web driver works outside the web browser but Cypress works directly inside the browser DOM elements. Initially, it was developed for developers to do unit testing. However, it was later extended to testers to do end-to-end automation testing. Cypress test uses NPM for JavaScript. If you have experience with JavaScript, it is easy to work with Cypress. Advantages: Execution speed is high. Can capture videos and take screenshots. Easy debugging. Able to visualise which test and command is running in your app. Disadvantages: It only supports Chrome browser, Firefox(Inprogress). It only supports JavaScript. It does not support any native or mobile events. Pre-Requisites: Install Nodejs. Install any IDE like Visual Studio Code Let’s Play With Your First Cypress Test: 1. Open Visual Studio Code IDE. 2. Create a Folder 3. Open a Terminal — Click View -> Terminal 4. npm init is the initialiser, which is used to set up a new or existing npm package. Type the command in the terminal as: npm init -y 5. The command below is used to install the Cypress package. Type the command in the terminal as: npm install cypress –save-dev 6. The command below is used to open the Cypress environment. Type the command in the terminal as: ./node_modules/.bin/cypress open 7. Click “OK, Got It” button in the Cypress GUI. 8. In Cypress GUI, we have default sample codes, and you can just click any one of the spec files. You will be able to view the automation test for the particular spec file. Now, enjoy the sweet feeling of successfully installing and running the Cypress test automation framework! 9. In your project folder, the structure should look like this: Let’s start with our scripts : Under the framework structure, we will store our spec file in the Integration folder. Delete the default Example folder and their scripts. Create a new script file name with the .spec.js extension (Example: test.spec.js) ├── cypress │ ├── integration │ │ ├── test.spec.js Add the code below to your spec file (test.spec.js) describe("Verify user should able to Search a keyword cypress framework in google", () => { it("Launch", function() { cy.visit('https://www.google.com/');} ); it('Enter the search keyword',function(){ cy.get('.gLFyf').type("Palo IT").should("have.value", "Palo IT");} ); it('Click on search button',function(){ cy.contains('Google Search').click();} ); it('Verify the search title', function(){c y.title().should('eq', 'Palo IT - Google Search') }) }); Cypress has adopted the Mocha’s bdd syntax like describe(), context(), it(), etc,. It’s a very useful way to keep the tests easier to read like a feature file in Cucumber. 3. (A) Enter the command below in your terminal to run the script in Chrome browser. ./node_modules/.bin/cypress open Cypress GUI should look like this, then click the link test.spec.js. Your script will run and display the results in your Chrome browser. Close the Cypress GUI. 3. (B) Enter the command below in your terminal to run the script in Electron (Headless browser). C:\Users\Jeevan\Desktop\Jeevan\Cypress_Framework> npx cypress run “Videos” are added automatically when you run in headless browser. Finally, execute your written script in Cypress. And that’s it! Thanks for reading and stay tuned for more updates about this topic!

Retour d’expérience ...

Cela est notamment dû aux avancées relativement récentes dans le domaine du Traitement du langage naturel (Natual Language Processing ou NLP), et au fait que l’interface texte est très courante (et très accessible) sur mobile. Notre laboratoire d’innovation Marmelab a décidé d’explorer cette technologie et pour ce faire, de réaliser un projet concret nommé Tobaccobot. Il s’agit d’un coach virtuel pour arrêter de fumer en un mois, avec qui l’utilisateur communique uniquement par SMS. L’interface Le principe du bot est très simple: une personne qui a envie d’arrêter de fumer s’inscrit au programme via une page web, avec son nom et son numéro de téléphone. A partir de là, toutes les interactions se font par SMS. Le fumeur reçoit un message qui lui demande combien de cigarettes il a fumé ce jour-là. A partir de la réponse à cette question, le bot va déterminer un nombre maximum de cigarettes à ne pas dépasser pour la prochaine semaine – avec pour objectif d’aider le fumeur à arrêter totalement en 4 semaines. Chaque matin, le fumeur recevra un SMS lui demandant combien de cigarette il a fumé la veille, de manière à évaluer la progression. En fonction de sa réponse, le bot l’encouragera ou le réprimandera. Et les réponses devront varier d’un jour sur l’autre, pour ne pas lasser le fumeur. A la fin de chaque semaine, le bot fixe un nouvel objectif à atteindre – forcément plus ambitieux que la semaine précédente. A la fin de la 4ème semaine, le bot détermine si le fumeur est oui ou non parvenu à arrêter de fumer. Il envoie un message d’adieu et la conversation s’arrête là. A tout moment, le fumeur peut décider d’interrompre le programme. Note:Nous ne sommes pas tabacologues chez Marmelab – et pour tout dire, il n’y a même pas de gros fumeur chez nous. Ce cas d’utilisation a juste été choisi pour servir de support à une expérimentation technique. Si ce coach virtuel aide un jour quelqu’un à arrêter de fumer, alors nous aurons fait d’une pierre deux coups ! Le workflow de conversation Pas évident de trouver un formalisme pour modéliser une interface conversationnelle. Nous avons tenté de dessiner des boites et des flèches, et nous sommes parvenus au résultat suivant : Note:Après le début du développement, nous avons découvert un super outil pour modéliser un workflow à partir d’une description texte :code2flow. Les technologies utilisées Pour implémenter ce coach virtuel par SMS, nous avons choisi d’utiliser les technologies suivantes: Node.js pour la partie serveur, en mode Serverless avec AWS lambda DynamoDb pour le stockage de l’état du fumeur Octopush pour l’envoi et la réception de SMS nlp compromise pour le traitement en langage naturel (ou Natural Language Processing, NLP) Nous allons revenir en détail sur les raisons du choix de ces technologies et leur utilisation dans les sections suivantes. Si vous voulez voir du code, sautez à la fin de l’article pour y trouver le lien vers la source du projet, que nous publions en licence MIT. Comment ça ?! Pas de botkit ! Dans le monde Node.js, la librairie de référence pour implémenter des chatbots est Botkit. Cette librairie très populaire, bien qu’étant d’excellente qualité, ne correspond pas à notre cas d’utilisation. Tout d’abord, botkit vise surtout les plateformes de chat (Slack, Messenger, etc.) mais ne supporte pas l’envoi et la réception de SMS. Il existe bien botkit-sms, mais ce projet n’est pas très actif, et utilise Twilio. Or nous avons choisi Octopush. Il aurait donc fallu développer notre propre adaptateur. Ensuite, Botkit est prévu pour écouter sur un port l’arrivée de messages. Il s’agit d’un démon, un process node qui ne s’arrête jamais. Mais avec serverless, le service doit s’arrêter après le traitement de chaque message, et est stoppé de force s’il ne rend pas la main dans les 5 secondes. Il aurait donc fallu forcer botkit à quitter après chaque message en killant le process node – pas très propre. Enfin, puisqu’il est prévu pour s’exécuter sous forme de tâche de fond, botkit persiste le contexte des conversations en mémoire. Ce contexte est vidé à chaque redémarrage. Donc nativement, il n’est pas facile de conserver un contexte de conversation en mode serverless. Il est bien entendu possible de fournir à botkit un stockage de conversation personnalisé (pour sauvegarder vers dynamodDb dans notre cas). Mais botkit impose trois tables: users, channels et teams, dont au moins deux n’ont aucun sens dans notre cas (channels et teams). Il aurait fallu tout de même les implémenter ou du moins les mocker. Au vu de toutes ces limitations, nous avons décidé que botkit n’était pas approprié pour notre application. AWS Lambda Vous connaissez peut-être le principe d’AWS lambda: c’est un hébergement PaaS (platform-as-a-service) semblable à heroku, où on ne déploie que… des fonctions. Dans ce contexte, une application est un ensemble de fonctions qui sont appelées en réponse à des événements (par exemple requête HTTP ou cron). Et c’est API Gateway, autre service d’Amazon, qui se charge de router les appels à une API HTTP vers une fonction lambda pour en calculer la réponse. Cela permet de n’exécuter le code que quand il est nécessaire, et de se passer d’un serveur web. L’activité de notre bot est très ponctuelle : il relance l’utilisateur une fois par jour, et n’attend qu’une réponse par jour. Faire tourner un serveur pour rien 99% du temps serait du gâchis dans ce cas; l’approche AWS lambda est toute indiquée. Sous le capot, AWS utilise Docker pour stocker les fonctions lambda. Il réveille un conteneur lorsqu’une lambda est sollicitée, et le rendort après quelques minutes d’inactivité. Mais tout cela se fait automatiquement, et le développeur ne voit, lui, que des fonctions. Donc hormis le “serveur” d’API Gateway, qui est en fait juste un reverse proxy géant mutualisé, AWS facture uniquement l’hébergement lambda à l’appel de fonction, c’est-à-dire à la requête. Et c’est extrêmement bon marché (le premier million de requêtes est gratuit). Serverless Serverless est une librairie JS open-source qui permet d’utiliser AWS lambda facilement, en automatisant la configuration et le déploiement sur AWS. Cette librairie prend en charge non seulement AWS lambda bien sûr, mais aussi API Gateway pour les événements HTTP, ainsi que cron et Dynamodb pour la base de donnée dans notre cas. Serverless utilise un fichier de configuration serverless.yml, dans lequel on déclare les lambdas (functions) et les resources (resources) utilisées par les lambdas. Voici pour exemple celui de tobaccobot : service: tobaccobot functions: botConversation: handler:src/serverless/index.botConversation # la fonction exportée avec le nom botConversation dans le fichier index.js events: # ce qui déclenche l'appel de cette fonction - http: # la partie HTTP sert à configurer API Gateway method: POST integration: lambda path: bot_conversation # le path dans l'url cors: true # L'API HTTP accepte les appels de n'importe quel domaine (CORS) getBotConversation: handler: src/serverless/index.botConversation events: - http: method: GET # la même fonction doit répondre en POST et en GET, contrainte d'octopush (voir plus loin) integration: lambda path: bot_conversation cors: true dailyMessage: handler: src/serverless/index.dailyMessage events: - schedule: # ici ce n'est pas une requête HTTP qui déclenche l'appel mais un cron rate: cron(0 8 ? * * *) enabled: true setupTables: handler: src/serverless/index.setupTables # pas d'events, on ne peut donc l'appeler qu'avec l'API AWS subscribe: handler: src/serverless/index.subscribe events: - http: method: POST integration: lambda path: subscribe cors: true reportData: handler: src/serverless/index.reportData events: - http: method: POST integration: lambda path: report_data cors: true resources: Resources: # une table dynamodb pour stocker les infos des fumeurs DynamoDbSmokerTable: # Les noms de ressources doivent être uniques Type: AWS::DynamoDB::Table Properties: TableName: smoker AttributeDefinitions: - AttributeName: phone AttributeType: S # string KeySchema: - AttributeName: phone KeyType: HASH ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 5 # une policy IAM pour permettre aux lambdas d'accéder à cette table dynamodb DynamoDBSmokerIamPolicy: # Y compris les noms des policies Type: AWS::IAM::Policy DependsOn: DynamoDbSmokerTable Properties: PolicyName: lambda-dynamodb-smoker # Ce nom doit également être unique PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - dynamodb:DescribeTable - dynamodb:GetItem - dynamodb:PutItem - dynamodb:UpdateItem - dynamodb:DeleteItem - dynamodb:Scan Resource: arn:aws:dynamodb:*:*:table/smoker Roles: - Ref: IamRoleLambdaExecution # une autre table dynamodb pour stocker les infos des fumeurs qui sont arrivés au bout du programme DynamoDbArchiveTable: Type: AWS::DynamoDB::Table Properties: TableName: archive AttributeDefinitions: - AttributeName: id AttributeType: S KeySchema: - AttributeName: id KeyType: HASH ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 5 # comme la précédente, il faut une policy pour la rendre accessible DynamoDBArchiveIamPolicy: Type: AWS::IAM::Policy DependsOn: DynamoDbArchiveTable Properties: PolicyName: lambda-dynamodb-archive PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - dynamodb:DescribeTable - dynamodb:GetItem - dynamodb:PutItem - dynamodb:UpdateItem - dynamodb:DeleteItem - dynamodb:Scan Resource: arn:aws:dynamodb:*:*:table/archive Roles: - Ref: IamRoleLambdaExecution provider: name: aws runtime: nodejs4.3 stage: dev region: eu-west-1 cfLogs: true plugins: - serverless-webpack - serverless-offline custom: webpack: ./webpack.config.serverless.js # notre conf webpack serverless-offline: # la conf pour l'exécution en local babelOptions: presets: ["es2015-node4", "es2016"] plugins: ["add-module-exports", "transform-runtime"] Serverless fournit sa propre version du package aws-sdk, déjà configuré avec les bons accès. Et cela inclut les accès IAM. Serverless : les pièges Le hic principal avec serverless, c’est l’environnement de développement. Un développeur n’a pas de service AWS qui tourne sur son poste de travail. Comment tester les fonctions lambda dans ce contexte ? Le plugin serverless-webpack permet de servir les lambdas en local, mais il ne suit pas la spécification API gateway. Heureusement, il existe le plugin serverless-offline, qui émule AWS lambda et API gateway. Il accepte aussi une configuration babel. C’est un must have ! Serverless a eu une mise à jour majeure entre les versions 0.5 et 1.0, et l’on trouve encore beaucoup de documentation concernant la version précédente. Ne soyez pas étonné que le copier/coller depuis Stack Overflow ne donne rien, et lisez la doc officielle. Les logs des lambdas sont consultables grâce à la commande serverless logs -f [lambdaName]. Peu importe le nombre de conteners utilisés par AWS: tous les logs d’une lambda sont rassemblés chronologiquement. Serverless consigne automatiquement le résultat des console.error() et console.info(), mais il ignore les console.log(). API Gateway ne peut retourner que du JSON. Il est donc impossible d’utiliser une lambda pour servir du HTML, ou une image qui serait générée. Node.js Pour ce qui est du code en lui-même, AWS lambda utilise Node 4.3.2. Serverless compresse le code de la fonction lambda dans un fichier zip. Les packages node ne sont pas inclus, et AWS lambda n’accepte pas d’en installer de son côté. Pour utiliser des packages externes, il faut donc concaténer notre code et celui de ses dépendances dans une seule fonction – c’est le travail d’un module bundler. Nous avons choisi Webpack, que nous utilisons couramment pour le développement frontend. Serverless fournit également un plugin webpack pour automatiser la construction des fichiers à déployer: plugins: - serverless-webpack custom: webpack: chemin/vers/webpack.config.js D’ailleurs, quitte à utiliser webpack, autant ajouter babel également, histoire de profiter des dernières nouveautés d’ES6. Rien de nouveau de ce côté là. Un inconvénient de webpack est que certaines librairies que l’on a l’habitude d’utiliser côté serveur ne fonctionnent plus. C’est par exemple le cas de config, qui lit les fichiers de config au moment de l’exécution. Ce problème est nuancé par la disponibilité d’un plugin permettant de reproduire le mécanisme de manière transparente: webpack-config-plugin. DynamoDb DynamoDb est une base de donnée clef/valeur relativement simple, semblable à Redis. Elle permet de définir un table avec une clef de partition qui sert d’identifiant unique pour l’objet. Si on veut, on peut ajouter une clef de tri, mais dans ce cas la clef de partition n’est plus unique et c’est la clef de tri qui fait la différence. Dans notre cas nous avons choisi une seule clef de partition: le numéro de téléphone de l’utilisateur. Mis à part les clefs, un document dynamoDb n’a aucune validation et accepte tout format. Aws-sdk fournit l’objet dynamoDb pour interroger le service dynamoDb. Il propose également une interface web très facile d’utilisation. Dynamo DB : les pièges DynamoDb retourne des objets avec une structure un peu particulière, qui précise le type de chaque champ: { name: { S: 'john' // une clef est ajouté pour préciser le type de donnée de l'attribut, ici une string } } Il est fastidieux de convertir ce format en simple format JSON et inversement. Heureusement, il existe la librairie dynamodb-oop qui réalise cette transformation et offre une api légèrement plus agréable. Il faut néanmoins faire attention à 2 points : Une opération getItem retourne un objet vide ({}) et non null lorsque l’objet n’a pas été trouvé. Les opérations createTable et deleteTable, bien qu’acceptant un callback, retournent lorsque l’opération à été initialisée et non pas terminée. Pour être sûr que ce type d’opération est achevée, il faut utiliser dynamloDb.waitFor qui permet d’attendre un événement, en l’occurrence tableExists et tableNotExists. Par exemple pour createTable : function createTable(params) { return new Promise((resolve, reject) => { dynamoDB.on('error', (operation, error) => reject(error)); dynamoDB.client.createTable(params, (err) => { if (err) { reject(err); return; } dynamoDB.client.waitFor('tableExists', params, (errTableExists, result) => { if (errTableExists) return reject(errTableExists); return resolve(result); }); }); }); } A noter que côté AWS, serverless gère la création de la table automatiquement. Pour émuler le stockage sur dynamo DB en local, il existe un module dynamodb-local. Il n’offre par contre pas d’interface web pour consulter et éditer le contenu dynamodb aisément. dynamodb-local ne propose qu’une console beaucoup trop limitée, puisqu’elle demande de coder les opérations à réaliser en javascript en utilisant aws-sdk. Cette console est accessible sur le port 8000. Octopush Pour l’envoi des SMS nous avons choisi Octopush qui est le moins cher, malgré une api orientée campagne de publicité. Pour utiliser Octopush il existe un module node: octopush. L’utilisation est très simple: // On crée une instance de SMS avec nos credentials const sms = new octopush.SMS(config.octopush.user_login, config.octopush.api_key); // On appelle ensuite un certain nombre de fonctions de configuration, par exemple: sms.set_sms_text(message); sms.set_sms_recipients([phone]); // Attention, il faut passer un tableau sms.set_sms_request_id(sms.uniqid()); // Il est possible de spécifier un identifiant que l'on génère de notre côté ... // L'envoi du sms sms.send((error, result) => { ... }); Il est à noter qu’Octopush supporte le publipostage comme le suggère le fait que set_sms_recipients accepte un tableau de numéros de téléphones. Il est alors possible de remplacer des variables dans le texte. Malheureusement, elles ne sont qu’au nombre de 5: {ch1}, les valeurs sont spécifiées en appelant sms.set_sms_fields_1([…]) {ch2}, les valeurs sont spécifiées en appelant sms.set_sms_fields_2([…]) {ch3}, les valeurs sont spécifiées en appelant sms.set_sms_fields_3([…]) {prenom}, les valeurs sont spécifiées en appelant sms.set_recipients_first_names([…]) {nom}, les valeurs sont spécifiées en appelant sms.set_recipients_last_names([…]) Octopush : les pièges Pour gérer les réponses des utilisateurs, il faut fournir une URL qu’Octopush appellera avec la réponse. Pour se conformer aux spécifications d’Octopush, cette URL doit répondre immédiatement sans retourner de contenu. Le traitement par l’application cliente doit donc s’effectuer de manière asynchrone après avoir répondu à Octopush. Octopush demande également que cette url soit interrogeable en GET pour pouvoir la tester depuis un navigateur…. La vérification de cette url n’est pas automatisée pour l’instant, et peut leur prendre jusqu’à une journée… Octopush ne récupère que les SMS en réponse à un message attendant une réponse (option_with_replies). Cela signifie que si l’utilisateur envoie plusieurs messages successifs, seul le premier sera pris en compte. Nous avons eu besoin d’une quatrième variable pour l’un de nos messages et avons simplement utilisé la variable prenom dans ce cas. Au moment de l’écriture de cet article, la documentation d’octopush précise à tort que set_recipients_first_names remplacera les chaines {nom} et que set_recipients_last_names remplacera les chaines {prenom}. Tobaccobot en détail La logique de conversation Le workflow de conversation montre que ce bot est en fait une machine à état tout-à-fait classique. Une action (une requête HTTP, un cron) fait passer l’objet smoker d’un état à un autre en fonction de certaines règles. Il existe de nombreuses librairies pour implémenter une machine à état, mais vu la simplicité de la logique de tobaccobot, nul besoin d’aller chercher plus loin que quelques if imbriqués dans une fonction. La signature de cette fonction est (etat, action) => état. Si vous pratiquez la programmation fonctionnelle ou React, vous reconnaissez sans doute ce pattern: c’est celui d’un reducer. Et une librairie a fait beaucoup parler d’elle pour une implémentation de ce pattern à destination de React: c’est redux. Utilisant cette librairie de façon intensive sur des projets frontend, nous avons naturellement commencé par elle pour implémenter la logique de conversation. Mais en définitive, redux n’apporte rien de plus que la fonction reduce() native dans notre cas, et nous avons fini par supprimer cette dépendance. Voici par exemple un extrait du code qui, à partir de l’état du smoker déduit d’un nombre de cigarettes consommées, déduit le message à envoyer: export default (evaluation) => { if (evaluation.backFromBad === 1) { return backFromBad(); } if (evaluation.backFromBad === 2) { return backFromReallyBad(evaluation.targetConsumption); } if (evaluation.backFromBad > 2) { return backFromBadCombo(); } const lastDelta = evaluation.delta.slice(-1)[0]; const previousDelta = evaluation.delta.slice(-2)[0]; if (lastDelta <= -3) { if (evaluation.delta.length >= 2 && previousDelta <= -3) { return continuedGreatProgress(lastDelta); } return greatProgress(lastDelta); } if (evaluation.state === 'bad') { if (evaluation.combo.hit === 2) { return reallyBad(reallyBadLinks[(evaluation.combo.repeatition - 1) % 3]); } if (evaluation.combo.hit > 2) { return badCombo( evaluation.combo.hit, evaluation.targetConsumption, badComboLinks[(evaluation.combo.repeatition - 1) % 3] ); } return bad(evaluation.targetConsumption); } if (evaluation.combo.hit === 2) { return reallyGood(); } if (evaluation.combo.hit > 2) { return goodCombo(evaluation.combo.hit); } return good(); }; Pour le contenu des messagesbackFromBad(),backFromReallyBad()et les autres, jetez un oeil àla source. Les side effects Dans notre machine a état, les actions ont deux effets: changer l’état du smoker, et un ensemble d’opérations qui ne sont pas répercutées dans l’état du smoker (stockage dans dynamodb, envoi de SMS, logs). Cet ensemble d’opérations n’est pas modélisable par une fonction pure (au sens de la programmation fonctionnelle), on les appelle des side effects. Très souvent, ces side effects sont des opérations asynchrones. Pour gérer ces opérations asynchrones, plutôt que d’utiliser les callbacks, nous avons utilisé les générateurs. Et nous nous sommes aidés de sg, une petite librairie créé par Marmelab. sg gère l’ordonnancement des tâches asynchrones avec des générateurs (comme le fait co.js), mais au lieu de retourner des promesses directement, sg retourne des effets décrivant quoi faire (comme le fait redux-saga). Les générateurs permettent de décrire le flux des actions asynchrones de manière synchrone et, avec les effets, on peut testerl’ordonnancement des opérations sans avoir à ce soucier de leurs implémentations. L’effet le plus couramment utilisé est call. Il s’agit simplement de l’appel d’une fonction asynchrone. Par exemple, avec le générateur suivant: export default function* dailyMessageSaga(smokers) { const dailySmokers = yield call(getDailySmokers, smokers); const { asked = [], dubious = [], qualified = [] } = yield call(sortSmokersByState, dailySmokers); yield call(notifyDubious, dubious); // Users with asked state haven't answered the previous day, we send them a message for the current day anyway yield call(notifyQualified, [...asked, ...qualified]); } Il est possible d’écrire les tests de cette façon: describe('dailyMessageSaga', () => { let iterator; before(() => { iterator = dailyMessageSaga('users'); }); it('should call getDailySmokers with users passed to the saga', () => { const { value } = iterator.next(); expect(value).toEqual(call(getDailySmokers, 'users')); }); it('should call sortSmokersByState with users returned by getDailySmokers', () => { const { value } = iterator.next('dailySmokers'); expect(value).toEqual(call(sortSmokersByState, 'dailySmokers')); }); it('should call notifyQualified with qualified and asked key then notifyDubious with dubious key', () => { let { value } = iterator.next({ asked: ['asked'], qualified: ['qualified'], dubious: 'dubious' }); expect(value).toEqual(call(notifyDubious, 'dubious')); value = iterator.next().value; expect(value).toEqual(call(notifyQualified, ['asked', 'qualified'])); }); }); Découpage Passons maintenant à l’implémentation de notre bot. Il est composé de 3 lambdas : subscribe répond au post du formulaire ; il crée un utilisateur et envoie le premier SMS dailyMessage est exécuté par un cron qui envoie le message journalier à chaque utilisateur, le message étant basé sur l’état de l’utilisateur botConversation est appelé par Octopush et traite les réponses de l’utilisateur Passons rapidement sur la lambda subscribe, qui est activée par une route POST appelée par un simple formulaire statique hébergé sur s3. subscribe: handler: src/serverless/index.subscribe events: - http: method: POST integration: lambda path: subscribe cors: true Traitement des messages entrants La lambdabotConversationest appelé par Octopush via une route POST: dailyMessage: handler: src/serverless/index.dailyMessage events: - schedule: rate: cron(0 8 ? * * *) enabled: false La syntaxe cron pour AWS prend 6 paramètres: minutes, heures, jour du mois, mois, jour de la semaine, et année. On ne peut pas activer simultanément le jour du mois et le jour de la semaine ; pour ignorer l’un des deux on utilise le caractère ?. La lambda dailyMessage récupère tous les utilisateurs avec la commande scan de DynamoDB. scan accepte en paramètres batchSize et exclusiveStartKey, qui permettent de réaliser la commande en batch. batchSize spécifie le nombre de résultats à retourner, et exclusiveStartKey précise la clef à partir de laquelle reprendre la requête. Le résultat de scan inclut la dernière clef retournée. Pour exécuter les traitements en série, nous utilisons une récursion sur le générateur. function* dailyMessage() { /// ... first batch yield* dailyMessage(lastKey); } Ensuite, chaque utilisateur est trié suivant son état dubious/qualified, et le nombre de jours restants. Les utilisateurs dubious sont les utilisateurs qui se sont inscris, mais n’ont jamais ou mal répondu à la première question. dailyMessage va alors les relancer. Enfin, les utilisateur vont être triés selon le nombre de jours qu’il leur reste: S’ils sont à la fin du programme: si leur consommation est descendue à 0 cigarettes sur les 3 derniers jours, nous les félicitons. Sinon, nous les invitons à recommencer. S’ils sont à la fin d’une semaine: nous spécifions un nouvel objectif. Dans tout les autres cas: On décrémente le nombre de jour restant et on demande à l’utilisateur combien de cigarettes il a fumé hier. L’implémentation de cette machine a été assez simple se compose de quelques if impriqués – rien de très notable, à part l’apport bénéfique de sg qui simplifie les side effects. Traitement du langage naturel De plus en plus de librairies rendant le traitement en langage naturel (ou NLP pour Natural Language Processing) accessible apparaissent, et notemment en node.js: nlp_compromise natural Le NLP est un sujet coeur pour les bots quand il s’agit de traiter les questions. De notre côté, nous n’avions qu’à traiter des réponses, et dans un cadre très restreint. nlp nous a simplement permis de récupérer le nombre de cigarettes dans les messages envoyés par l’utilisateur. Que celui nous réponde at least 15 cigarettes, no more than fifteen cigarettes or 15, nlp nous retourne 15. Conclusion Le projet tobaccobot, a été l’occasion de nous familiariser avec plusieurs de technologies: serverless, aws lambda, aws dynamoDb, octopush. Serverless est un outil puissant, mais mettre en place le bon environnement de développement a demandé beaucoup d’expérimentation pour trouver la bonne configuration. De plus, nous avons passé beaucoup de temps à nous documenter et à configurer l’environnement serverless comparé à un serveur traditionnel. Cela dit, ce travail ayant été réalisé, la mise en place sera bien plus rapide à l’avenir. Une fois la partie serverless mise en place, le bot en lui même s’est révélé simple à implémenter, puisqu’il s’agit de prendre un événement (sms ou cron) et un état en entrée et de mettre à jour l’état et générer un message en sortie. La modélisation de la conversation est donc la partie la plus difficile. Il aurait été intéressant d’avoir à gérer une interaction avec un groupe d’utilisateur, ou une interaction plus variée. Tout bien considéré, cela reste une bonne introduction à la réalisation d’un bot. Le code de notre tobaccobot est disponible sur github: https://github.com/marmelab/tobaccobot Retrouvez l’article original en cliquant ici !

Visually test your a...

The quickie of Alexandre Delattre (Viseo) on Marble testing with Rx (JS/Java/…) during the DevFest Toulouse 2017 was particularly interesting. What is Rx? Rx is a library for composing asynchronous and event-based programs by using observable sequences. It provides one core type, the Observable, satellite types (Observer, Schedulers, Subjects) and operators inspired by Array#extras (map, filter, reduce, every, etc) to allow handling asynchronous events as collections. - From RxJS doc We can use Rx in the frontend (for service calls combinations and reactive user interface) as well as in the backend (micro-services calls combinations, websockets, …). Problematic The current trend is to transform imperative programming into reactive functional programming. With the tools at our disposal, testing asynchronous behaviours is hard, and often, developers just skip this important step. But it is possible! And now, simpler than ever. So how to do that? How to check that our streams unfold the way we want them to? You guessed right: with Marble Testing. Marble diagrams In order to representObservables, we define Marble diagrams. They are drawn as a horizontal timeline, with events occurring as visual nodes. We can represent them like this example of a themergefunction that takes two observables and return a merge of the two. You can refer to RxMarbles website in order to find interactive diagrams of Rx Observables. In order to use them in code, we define an ASCII notation. First, we define the time frame (default is 10ms). Then we can have a look at the different symbols that we need: - : Nothing happens during one frame | : the observable is completed (onComplete) # : observable error (onError) x : the observable emits a value (onNext) ^ : subscription point of an Observable (only for hot Observables) () : value grouping Example of a mobile weather application For this example of application, the speaker chose the language Kotlin, but we could do the same with any Rx supported language and platform (see full list onReactiveX site). Application Requirements We have an “instant search” application, with the user inputting their city’s name. After a 500ms delay, we launch the search, and a loading progress is visible to the user during the search. Then the result is displayed, or an error, if need be. Interfaces Our available interfaces are the following: interface WeatherViewModel { // Inputs val city: Subject // Outputs val state: Observable<State> val weather: Observable<WeatherData> } sealed class State object Idle : State() object Loading : State() data class Error(val e:Throwable) : State() data class WeatherData ( val city: String, val pictoUrl: String, val minTemperature: Float, val maxTemperature: Float ) interface WeatherService { fun getWeather(city: String): Single<WeatherData> } Implementation city = BehaviorSubject.createDefault("") state = BehaviorSubject.createDefault(Idle) weather = city .filter { it.isNotEmpty() } .debounce(500, TimeUnit.MILLISECONDS, mainScheduler) .switchMap { weatherService.getWeather(it) .observeOn(mainScheduler) .doOnSubscribe { state.onNext(Loading) } .doOnSuccess { state.onNext(Idle) } .doOnError { state.onNext(Error(it)) } .toObservable() .onErrorResumeNext(Observable.empty()) } Use case diagram For example, in this diagram, the user starts typing “Toulouse”, and after 500ms without activity (no keystroke pressed), we call the webservice to get the weather in Toulouse. The webservice then returns the response (sunny weather). Afterwards, the user wants to check the weather in Paris, so after the delay, the webservice is called, and then we get the response. Marble testing implementation @Before fun setup() { weatherService = Mockito.mock(WeatherService::class.java) scheduler = MarbleScheduler(100) viewModel = WeatherViewModelImpl(weatherService, scheduler) } Following are the values that we need in order to test. We map the symbol “0” to the event “empty string”, the symbol “1” to the event the user inputs “tou”, the symbol “t” to the event the user inputs “toulouse”, etc. val cityValues = mapOf( "0" to "", "1" to "tou", "t" to "toulouse", "b" to "bordeaux" ) val stateValues = mapOf( "i" to Idle, "l" to Loading, "e" to Error(weatherError) ) val weatherValues = mapOf( "t" to weatherData, "b" to bordeauxData ) And these are the data that the webservice is mocked to respond. val weatherData = WeatherData("toulouse", "sunny", 20f, 30f) val bordeauxData = WeatherData("bordeaux", "cloudy", 10f, 15f) So now, the test looks like this. @Test fun test2Cities() { val s = scheduler val cityInput = s.hot( "0-1-t------------b----------", cityValues) // debouncing -----t -----b `when`(weatherService.getWeather("toulouse")) .thenReturn(s.single( "--t", weatherValues)) `when`(weatherService.getWeather("bordeaux")) .thenReturn(s.single( "--b", weatherValues)) s.expectObservable(viewModel.weather).toBe( "-----------t------------b---", weatherValues) s.expectObservable(viewModel.state).toBe( "i--------l-i----------l-i---", stateValues) cityInput.subscribe(viewModel.city) s.flush() } We obtain an ASCII visual representation of what we simulate the user interaction to be, and then, we tell the test what chain of events we expect to receive from the various observables. In this representation, we can visually check how the different timelines correspond, and easily test that the more complex chains of events actually lead to the observable that we want. Conclusion Pros Tests are more concise and expressive Complex cases can be tested visually Now testing the global coherence and behaviour is made possible. Cons The API suffers from differences between the different platform. Alignment of marbles can be visually challenging in ASCII. Possible improvements in the future The speaker concluded by proposing improvements in the future in order to counter the cons: Uniformisation of the APIs. Development of a graphical editor for marbles. He added that if someone in the conference wanted to get involved and develop a graphical editor, it would be great and useful.

Create a cross platf...

First experience with react-native and react-native-web To create a native app with code sharing today there are 2 main approaches: Hybrid app: write in JavaScript, HTML and CSS, and the entire code is embedded and run in a web view in mobile. Like Phonegap. JavaScript engine + native UI: write in JavaScript. Ui components are translated into native UI components. Other codes are run in a JavaScript engine provided by the mobile system. React Native is a framework represented by the second philosophy. It lets you create a mobile app using JavaScript. As a web app developer with not so much mobile background, it could be a good way to start a mobile app. React Native is based on React, same design as React, so he should have a good integration with other react lib. It is based on version 0.56.RC now, not yet a major version. But looking at who’s actually using React Native: Facebook, YouTube, Skype etc, we could have confidence in it. React Native is a Facebook project. To make real code sharing, we expect to have at the same time a web app without re-writing the UI part. That comes with the “react-native-web” framework, who brings the Components and APIs of React Native to web. As mentioned in React Native Web home pages, it is used in production by Twitter Lite. It is also supported by react-scripts. So let we start an experience of a cross device application with this two framework. I want to do something further than a hello world example. But let me start with initializing the project. Initialize a project There are two ways to initiate a React Native project as explained here. Create React Native App A quick way to create and start a mobile app if you have a device on which you want to run (otherwise you will need to install an emulator). npm install -g create-react-native-app create-react-native-app AwesomeProject It will be hosted by “expo” configuration so you can quickly run your native app within Expo client app. Scripts run will deploy mobile app within an expo container. React-native-cli In this case you will need a full mobile development environment which means XCode for iOS and Android Studio and Android SDK for Android etc to start with. npm install -g react-native-cli react-native init MyNote The script creates two folders additional to “Android” and “iOS” and initiate a default setting for native app without “expo”. This should be the best way for initiating a standard project. To launch a simulator, take iOS for example, you can run react-native run-ios Or you can also open .xcodeproj in XCode and run the project. You can also do this job later in your react-native project with react-native upgrade Either the way of initialization, we now have a runnable native app project. So far so good. Everything goes well. Configure the native project as a web app React Native translates its UI components to native platform components for iOS and Android, and React-native-web will do the job for a web platform. Check its > Github page < We will need to add a few things to make web app available : On react-dom, react-native-web, babel-plugin-react-native-web> In the entrypoint index.web.js, instead of the classic react way to render you application with the DOM, we will do this in the React Native way, using AppRegistry. So my entry point is something like this. import App from './App'; import React from 'react'; import { AppRegistry } from 'react-native'; AppRegistry.registerComponent('MyNote', () => App); AppRegistry.runApplication('MyNote', { initialProps: {}, rootTag: document.getElementById('react-native-app') }); The thing is, react-script can launch a react-native project and automatically do the magical alias to react-native-web, but the embedded webpack config required a specific folder structure that does not so much fit the structure created by react-native. So I create my  webpack.config.js and run a webpack dev server. In the webpack config, we need a babel loader for everything expected to be compiled by Babel. And plug our “babel-plugin-react-native-web” here to take care of the aliases ‘react-native’ to ‘react-native-web’. Or you can also do this in you module export resolve. And don‘t forget to set your entry index.web.js. After all these, my project and my package.json look like this And I can now run my native app with xcode and on the other side my web app with script npm run web. When the code changes, a simple Cmd+R in simulator or in browser will reload app. A little bit settings for the web part, it’s a pity that the web app initialization is not included by react-native init step. And now we are ready for our develop environment. Developments : UI component and API The development is very similar to classic react. Just using React Native component instead of DOM component. The basic components of React Native are quite simple. View, Text, Image, TextInput, TouchableHighlight etc. You can easily associate a DOM interface using div, img, input, a with them. Most apps will end up using just these basic components. Component style is defined by ‘style’ prop. If you are familiar with CSS, the style name and value match usually how it works on web. The events and handlers are quite similar to DOM as well. For example a TextInput component has onChange, onKeyPress, onFocus and onBlur props. For a web developer, you should be able to make it out quite well for this part. More advanced native components are also available in react-native. Most common components are well supported in react-native-web. The latest version of react-native-web adds implement for SectionList. Still there are platform specific components. DatePicker is one of them. We can regret that iOS and Android could not reach an agreement with DatePicker interface. React native provides a Platform API to make platform specific codes. For a DatePicker for example, we could have something like this :   const DatePicker = Platform.select({ ios: <DatePickerIOS />, android: <DatePickerAndroid />, web: <input type='date' /> }) Many third party libraries exist today to unify the two mobiles platform codes (react-native-datepicker for example), but few of them includes web support. Responsive React-native component use FlexBox layout. FlexBox is a helpful tool to create responsive app. On the web side it will be translated into css flexbox properties, that means old browsers will not be supported, flexDirection, alignItems, justifyContent, alignSelf, flex properties are available in react-native, and work in the same way as in CSS. Dimension is another helpful API. Dimension.get can give the current height and width. You can create a dynamic rendering and styling logic depending on it. The calculation should be done at every render to guarantee the dimensions up to date with any changing due to device rotation or browser resize. Dimension API provides a change event listener. Platform API is also a choice to build rendering logic. In that case we usually would like to differentiate between a small mobile screen and a large browser window on laptop. Well actually Platform.OS and Platform.select has 3 possible values “iOS”, “Android” and “Web”. I don’t think it can distinguish an iPhone from a iPad, so your mobile screen layout may not be suitable for a tablet. Navigation Navigation is a hard part to make code sharing successful. Each platform has its own way to manage navigation and history. Unfortunately it is also one of the essential part of app. React-native does not provide a “official” API for navigation. It recommends some available navigation components. I’ve tried React Navigation which support both the mobile and web platform. Although, after trying several combination of react native and react navigation, I fixed in version 0.54.0 of react-native and 1.5.8 of react-navigation. Cause after react-navigation 2.0, web support is broken. And I had several problems to work react-navigation 1.5.8 with other versions of react-native. Live this instability in JS world. Well the fix for web is in V2 roadmap. React Navigation provides basic features of navigation like navigate, history, routing. Advanced features that could be interesting in React Navigation : Sub routing, multi-routing Deep link in 3 platform Customizable navigator Provides customizable UI for navigation like header and tab. Even though deep link is supported, I didn’t find any option to change URL when path changed in web platform. That needs to be implemented manually. Other classic features which do not need UI action works well in mobile device as in web browser, like async call, await, integration with Redux etc, as the code runs in a Javascript environment. If you use a JS library that does not reference DOM API, you should not have any surprise. Conclusion React native, with the help of React-native-web propose a quite simple way to create a cross device application. It include the essential requirement of an application with a possibility to customize. It comes with a rich ecosystem around React. It does not require a great mobile background to start and deploy a mobile app. It makes real code sharing between web and mobiles in 90% case. UI development is very similar to HTML development. Ecosystem of react-native is very dynamic. Compare with hybrid app, native component and API are used in react-native app (for mobile platforms). There are still some drawbacks To integrate web app with react-native, one will need some manual work. Even though, react-native + react-native-web is still a good choice to make a cross device app with a real code sharing and gains significantly in productivity.

Pair-Programming: A ...

Of the myriad Agile software development practices, Pair-Programming is one that has never failed to fascinate me with its effectiveness and simplicity. Pair-Programming, as the name implies, requires TWO developers to work on a single task or story. When adopted as the default mode of operation, an organization must be prepared to pay more upfront (in terms of man-hour costs) compared to traditional models of software development in which only one developer is needed per story. The real question is: Can this be justified so that stakeholders are convinced? On a current product for which the team has the luxury of indulging in Pair-Programming and Test-Driven Development as the default mode of operation, I observed that Pair-Programming has brought about a number of benefits. Ongoing development is never stopped because somebody is not around There is almost zero downtime in day-to-day development for stories. On a regular working day, it is incredibly rare for both programmers paired on a story to take leave at the same time. If one developer should go on leave, the remaining developer could always easily pair up with another developer in the team to continue work without ever losing the context. This mitigates knowledge loss and reduces dependency on any single employee. Code produced is more robust Features are less likely to break downstream due to edge cases (whether during the QA phase or in staging/production). On the current project, uncaught downstream issues, particularly those in production, are expensive to fix as personnel would have to be activated. In this case, Pair-Programming moves manpower costs upstream while improving the application’s robustness. If you are in the finance/telco space, this might be an important consideration since downtime frequently translates to potential penalties from the regulatory authorities. Onboarding time is significantly reduced By pairing an expert with a novice (in terms of system knowledge), pair-programming provides an appropriate setting where the novice can ask questions easily and gain hands-on experience faster. By encouraging human interaction through swapping of pairs, the newcomer will also blend into the team much quicker compared to solo development. Good team practices can be enforced Having developers frequently rotated across different pairs helps to encourage the adoption of important practices like TDD, proper coding standards and version control. Shared responsibility of code commits also results in code that stands up to scrutiny since lazy coding is usually caught during pairing. Potential Pitfalls While pair-programming might bring about the above-mentioned benefits, it is important to note that it does come with its caveats and should not be adopted blindly. Increased development time on some stories Stakeholders, sponsors and development teams need to be aware that pair-programming does not translate to increased team velocity. In fact, the implementation might sometimes take longer due to the frequent discussions that take place between pairing developers to arrive at a suitable code implementation. Pairing when not necessary There are scenarios when pair-programming might not be optimal. For instance, while working on complex technical spikes, pairing might become a distraction when a focused investigation into the codebase is required. Conversely, pairing is overkill on low-complexity stories that are known to only require minor changes. Developer Stickiness Effective pairing has the side effect of developers forming strong bonds with each other. As a result, same developer-pairs would end up working on multiple stories consecutively if a rotation is not practised. Unrotated pairs have the tendency to develop a tunnel vision which reduces the effectiveness of pair programming. Developers should make it a point to switch pairing partners regularly. Incorrect pairing dynamics It’s easy for expert-novice pairings to end up becoming teacher-student relationships. A teacher-student pairing assumes that the student is only present to learn and practise, rather than to produce actual deliverable work. For instance, a ‘teacher’ might constantly rework the student’s code as part of the ‘teaching’ process, or worse still, the ‘teacher’ performs all the coding while the student only takes notes. In such scenarios, fellow developers in the team should step in and offer to switch pairing at the earliest opportunity. ______________________________________________________________________________________________________ Pair-Programming, as with other XP practices, is not a magical pill that will transform team dynamics or software quality overnight. Teams adopting pair-programming for the first time might want to introduce it gradually by applying it to more complex development stories. Pair-Programming promotes the Agile principle of favouring face-to-face conversation and for engineering teams looking to achieve technical excellence, Pair-Programming is definitely something worth considering.

Dissecting Webpack: ...

In the previous post, we had discussed the setting up of Webpack configuration file and kick-starting it with Webpack development server. But what makes Webpack a one-stop bundler are the loaders and plugins. Loaders help Webpack to transform code or help in the development aspect whereas plugins come in at the end when the bundling is happening to enhance or optimise the performance of the application. The following figure shows some recommended loaders and plugins. Credit: Roohiya Dudukela As some of you may have noticed in the configuration chart we have above, there are 2 attributes that have not been mentioned yet: module: {rules: []}, plugins: [] The loaders configuration goes into modules.rules array while plugins configuration goes into…plugins array. So that was easy. Let’s dive right into, firstly, loaders. Loaders Babel Loader npm install babel-loader --save-dev This loader transpiles >ES5 code to ES5. The configuration is as follows: This informs Webpack to use babel-loader only on jsx files and exclude looking into node modules. Along with the loader, there are a bunch of dependencies to be installed: npm install babel-core --save-dev One may ask why we need a babel-loader when we already have a babel-polyfill. To understand this, we need to see what is the function of each tool. This video has a good explanation on the difference. In short, babel-loader takes care of transforming syntax that is above ES5, and the babel-polyfill is called to create new functions and methods, that browsers don’t support, on the fly. Both complement each other and are needed to handle different parts of the modern JS. For example, ‘const’ is transpiled to ‘var’ & and arrow function will be transpiled to an anonymous function. const myArray = new BetterArray( [1, 2, 3] ) → var myArray = new BetterArray( [1, 2, 3] ) var nums = list.map( ( v, i ) => v + i ) → var nums = list.map( function(v, i) { return v + i } ); CSS Loader & Style Loader Since Webpack only understands Javascript, we need to add loaders to tell it how to handle CSS files. npm install css-loader --save-dev CSS loader will look into all the css imports and urls and return a string (as shown below) that will be part of the main js. ... \"body {\\n background-color: pink;\\n\"... Since it is the part of the JS file, the browser does not have the capability to recognise and extract the css code from the js, so the styles will not be applied. npm install style-loader --save-dev What we need is a style loader that extracts the css string out of the js bundle and inject it into styles tag of the html file. With the above configuration, which specifies to use style and css loaders (in that order shown above) on all files, excluding ones in node modules, with a .css extension, we have the styles applied and CSS taken care of. File Loader & URL Loader Another set of assets that we need to explicitly tell Webpack how to handle is images and fonts. How we usually manage images is either to inject them inline in the <img> tags or store them in a server to make network requests to render the images. npm install file-loader --save-dev File loader only alters the path name of the file to give it a public URL. It looks for all the imports and urls of images being used and formats the path name accordingly. With the above configuration, the file-loader will load all the assets with the specified extensions, and place the images in the ‘/images’ folder with the format [name]_[hash].[ext]. Below are a couple of examples: background-image: url('images/dog.png') → images/dog_436bd585...png import Cat from 'images/cat.jpg' → images/cat_875ds32132dsda3...jpg However, if we place all the images on a server, the overheads for making multiple network request could dampen the performance. There is an advanced loader that is a wrapper around the file-loader, which is known as url-loader. npm install url-loader --save-dev What this loader does is handle images based on their size. This configuration looks a lot like the one for file-loader, but the interesting part is the attribute ‘limit’. This states the size limit of the image. < 8kb ? <img src='background-image: url(data:image/png;base64,iGwfd..)' /> : images/dog_876bd585bdc8a5cc40633ffefdb7a4a5.png If, in this case an image is less than 8kb, the url-loader will convert the image to base 64 string and inject it to the <img> tag, else it will default to file-loader which will create a public URL to be saved in the server. Images up to a certain size will be converted to base64 without slowing down the application. The limit can be toggled to gauge the most optimised performance for the project. Standard Loader This is an optional loader to help in development. This loader lints code based on Standard JS rules. This loads as a pre-loader and lints all .jsx files, excluding ones in node modules. In 4.0 For using loaders in Webpack 4.0 you still have to create a configuration file, and most of the configurations remains. Plugins HTML Webpack Plugin This plugin helps in the creation of a html file from scratch and injects variables into the HTML file. npm install html-webpack-plugin --save-dev Following is an example of what variables to set. These variables can be used in the index.html file such as: <title><%= htmlWebpackPlugin.options.title %></title> <link rel="icon" href="<%= htmlWebpackPlugin.options.favicon%>"> <div id="<%= htmlWebpackPlugin.options.appMountId%>"></div> Extract Text Webpack Plugin This plugin allows to extract text of any kind to a separate file. This configuration, uses CSS loader to resolve all the CSS but instead of injecting them into the style tags, with the help of style-loader, the plugin will take the CSS string and push it into another file. The extracted file will be rendered in parallel to the bundled file. Doing this has its pros and cons. Let’s go through them. Pros are that having the CSS in a separate file and not in style tags, will obviously reduce the style tags used and consequently the bundle size. With smaller bundle size, the load time will be faster. As mentioned earlier the CSS file renders in parallel which eliminates the flash of unstyled content. However, enabling this extraction will stop hot reload from working. And since we have another file, depending on the size, the compilation time will be longer. It is recommended to have this plugin only for production mode. In 4.0 Extract-text-webpack-plugin is deprecated and is replaced by mini-css-extract-plugin. Common Chunks Plugin Another very important aspect of bundling the project is code splitting. This can substantially optimise the performance of the application. This plugin helps to split out common codes in the project into chunks. These chunks can be loaded on demand or in parallel. This aims to achieve smaller bundle size and loading prioritisation. Credit: Roohiya Dudukela A small js file can grow to be really big which does not bode well for any application. So, it can be chunked into, for example, Multiple entries Chunks with common codes within multiple entries (Lodash) Vendor libraries that do not change as frequently as the main codebase (React) In the above example, we are adding another entry to the Webpack bundle, ‘vendor’. This is chunking out 3rd party libraries. With this separate chunk, our main chunk will be reduced considerably! In 4.0 CommonsChunkPlugin has been deprecated and instead the APIs optimize.splitChunks & optimization.runtimeChunk can be used. This is possible with the new plugin, SplitChunksPlugin. Instead of manually specifying what to chunk, the plugin is smart enough to identify modules that need to be chunked. UglifyJS Webpack Plugin This is a familiar plugin which obfuscates code and handles dead code elimination. Mode Webpack.common.js A common Webpack simply consists of configurations that are common between dev and production: Babel-polyfill Babel-loader CSS & Style loader URL loader So the configuration for Webpack.common.js is: Webpack.dev.js Development Server with hot reload Standard loader CSS loader with sourcemap (for debugging purposes) The configuration for Webpack is: Webpack.prod.js CSS extraction Uglify & dead code elimination Code splitting And the configuration for Webpack.prod.js is: Webpack Merge We can combine the common and dev, also the common and prod with Webpack-merge. In 4.0 Without the need of Webpack merge, we can use the script configuration to specify the -- mode flag and Webpack will take care of the rest. "dev": "webpack — mode development", "build": "webpack — mode production" For a complete react-redux webpack configuration, please take a look at this Github Repo for guidance. So hopefully this has been an enlightening journey, making you feel more under control over Webpack configurations. No more running away from this! References 4 Key Concepts of Webpack | Netlify Webpack is JavaScript module bundler that has taken the world by storm, but a lack of great docs and wealth of…www.netlify.com Extract Text Plugin In the last lesson, we got our styles working all good; getting the css and scss files bundled and then getting the…medium.com Plugins Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org webpack-contrib/file-loader file-loader — A file loader for webpackgithub.com DevServer Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org Webpack 4 Tutorial: from 0 Conf to Production Mode (Updated) webpack 4 is out! The popular module bundler gets a massive update. webpack 4, what’s new? A massive performance…www.valentinog.com

Dissecting Webpack: ...

In the previous post, we had discussed the setting up of Webpack configuration file and kick-starting it with Webpack development server. But what makes Webpack a one-stop bundler are the loaders and plugins. Loaders help Webpack to transform code or help in the development aspect whereas plugins come in at the end when the bundling is happening to enhance or optimise the performance of the application. The following figure shows some recommended loaders and plugins. Credit: Roohiya Dudukela As some of you may have noticed in the configuration chart we have above, there are 2 attributes that have not been mentioned yet: module: {rules: []}, plugins: [] The loaders configuration goes into modules.rules array while plugins configuration goes into…plugins array. So that was easy. Let’s dive right into, firstly, loaders. Loaders Babel Loader npm install babel-loader --save-dev This loader transpiles >ES5 code to ES5. The configuration is as follows: This informs Webpack to use babel-loader only on jsx files and exclude looking into node modules. Along with the loader, there are a bunch of dependencies to be installed: npm install babel-core --save-dev One may ask why we need a babel-loader when we already have a babel-polyfill. To understand this, we need to see what is the function of each tool. This video has a good explanation on the difference. In short, babel-loader takes care of transforming syntax that is above ES5, and the babel-polyfill is called to create new functions and methods, that browsers don’t support, on the fly. Both complement each other and are needed to handle different parts of the modern JS. For example, ‘const’ is transpiled to ‘var’ & and arrow function will be transpiled to an anonymous function. const myArray = new BetterArray( [1, 2, 3] ) → var myArray = new BetterArray( [1, 2, 3] ) var nums = list.map( ( v, i ) => v + i ) → var nums = list.map( function(v, i) { return v + i } ); CSS Loader & Style Loader Since Webpack only understands Javascript, we need to add loaders to tell it how to handle CSS files. npm install css-loader --save-dev CSS loader will look into all the css imports and urls and return a string (as shown below) that will be part of the main js. ... \"body {\\n background-color: pink;\\n\"... Since it is the part of the JS file, the browser does not have the capability to recognise and extract the css code from the js, so the styles will not be applied. npm install style-loader --save-dev What we need is a style loader that extracts the css string out of the js bundle and inject it into styles tag of the html file. With the above configuration, which specifies to use style and css loaders (in that order shown above) on all files, excluding ones in node modules, with a .css extension, we have the styles applied and CSS taken care of. File Loader & URL Loader Another set of assets that we need to explicitly tell Webpack how to handle is images and fonts. How we usually manage images is either to inject them inline in the <img> tags or store them in a server to make network requests to render the images. npm install file-loader --save-dev File loader only alters the path name of the file to give it a public URL. It looks for all the imports and urls of images being used and formats the path name accordingly. With the above configuration, the file-loader will load all the assets with the specified extensions, and place the images in the ‘/images’ folder with the format [name]_[hash].[ext]. Below are a couple of examples: background-image: url('images/dog.png') → images/dog_436bd585...png import Cat from 'images/cat.jpg' → images/cat_875ds32132dsda3...jpg However, if we place all the images on a server, the overheads for making multiple network request could dampen the performance. There is an advanced loader that is a wrapper around the file-loader, which is known as url-loader. npm install url-loader --save-dev What this loader does is handle images based on their size. This configuration looks a lot like the one for file-loader, but the interesting part is the attribute ‘limit’. This states the size limit of the image. < 8kb ? <img src='background-image: url(data:image/png;base64,iGwfd..)' /> : images/dog_876bd585bdc8a5cc40633ffefdb7a4a5.png If, in this case an image is less than 8kb, the url-loader will convert the image to base 64 string and inject it to the <img> tag, else it will default to file-loader which will create a public URL to be saved in the server. Images up to a certain size will be converted to base64 without slowing down the application. The limit can be toggled to gauge the most optimised performance for the project. Standard Loader This is an optional loader to help in development. This loader lints code based on Standard JS rules. This loads as a pre-loader and lints all .jsx files, excluding ones in node modules. In 4.0 For using loaders in Webpack 4.0 you still have to create a configuration file, and most of the configurations remains. Plugins HTML Webpack Plugin This plugin helps in the creation of a html file from scratch and injects variables into the HTML file. npm install html-webpack-plugin --save-dev Following is an example of what variables to set. These variables can be used in the index.html file such as: <title><%= htmlWebpackPlugin.options.title %></title> <link rel="icon" href="<%= htmlWebpackPlugin.options.favicon%>"> <div id="<%= htmlWebpackPlugin.options.appMountId%>"></div> Extract Text Webpack Plugin This plugin allows to extract text of any kind to a separate file. This configuration, uses CSS loader to resolve all the CSS but instead of injecting them into the style tags, with the help of style-loader, the plugin will take the CSS string and push it into another file. The extracted file will be rendered in parallel to the bundled file. Doing this has its pros and cons. Let’s go through them. Pros are that having the CSS in a separate file and not in style tags, will obviously reduce the style tags used and consequently the bundle size. With smaller bundle size, the load time will be faster. As mentioned earlier the CSS file renders in parallel which eliminates the flash of unstyled content. However, enabling this extraction will stop hot reload from working. And since we have another file, depending on the size, the compilation time will be longer. It is recommended to have this plugin only for production mode. In 4.0 Extract-text-webpack-plugin is deprecated and is replaced by mini-css-extract-plugin. Common Chunks Plugin Another very important aspect of bundling the project is code splitting. This can substantially optimise the performance of the application. This plugin helps to split out common codes in the project into chunks. These chunks can be loaded on demand or in parallel. This aims to achieve smaller bundle size and loading prioritisation. Credit: Roohiya Dudukela A small js file can grow to be really big which does not bode well for any application. So, it can be chunked into, for example, Multiple entries Chunks with common codes within multiple entries (Lodash) Vendor libraries that do not change as frequently as the main codebase (React) In the above example, we are adding another entry to the Webpack bundle, ‘vendor’. This is chunking out 3rd party libraries. With this separate chunk, our main chunk will be reduced considerably! In 4.0 CommonsChunkPlugin has been deprecated and instead the APIs optimize.splitChunks & optimization.runtimeChunk can be used. This is possible with the new plugin, SplitChunksPlugin. Instead of manually specifying what to chunk, the plugin is smart enough to identify modules that need to be chunked. UglifyJS Webpack Plugin This is a familiar plugin which obfuscates code and handles dead code elimination. Mode Webpack.common.js A common Webpack simply consists of configurations that are common between dev and production: Babel-polyfill Babel-loader CSS & Style loader URL loader So the configuration for Webpack.common.js is: Webpack.dev.js Development Server with hot reload Standard loader CSS loader with sourcemap (for debugging purposes) The configuration for Webpack is: Webpack.prod.js CSS extraction Uglify & dead code elimination Code splitting And the configuration for Webpack.prod.js is: Webpack Merge We can combine the common and dev, also the common and prod with Webpack-merge. In 4.0 Without the need of Webpack merge, we can use the script configuration to specify the -- mode flag and Webpack will take care of the rest. "dev": "webpack — mode development", "build": "webpack — mode production" For a complete react-redux webpack configuration, please take a look at this Github Repo for guidance. So hopefully this has been an enlightening journey, making you feel more under control over Webpack configurations. No more running away from this! References 4 Key Concepts of Webpack | Netlify Webpack is JavaScript module bundler that has taken the world by storm, but a lack of great docs and wealth of…www.netlify.com Extract Text Plugin In the last lesson, we got our styles working all good; getting the css and scss files bundled and then getting the…medium.com Plugins Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org webpack-contrib/file-loader file-loader — A file loader for webpackgithub.com DevServer Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org Webpack 4 Tutorial: from 0 Conf to Production Mode (Updated) webpack 4 is out! The popular module bundler gets a massive update. webpack 4, what’s new? A massive performance…www.valentinog.com

Dissecting Webpack: ...

Many front-end developers shy away from Webpack, much less tackle it head-on. The nature of Webpack is as such, that it has too many configuration options and tinkering with it could break the application. So this article attempts to simplify major concepts in the building of a frontend project specifically with React. Sprinkled in the article are little tips to prepare for Webpack 4.0! What is Webpack? Overview A frontend application has multiple files, .js, .jsx, .png, .jpg, .css, .ttf. We cannot possibly take these bunch of files and dump it in the production web server. The performance load time and overheads will be ghastly. What we need is a single bundled file of js that holds the structure and logic of the components and application, a single file of CSS for styles, and HTML file to render the DOM and an assets folders for images and fonts. There have been many tools and task runners that have helped in the bundling process. But Webpack has emerged to be the one-stop solution for many React frontend developers.                                                                                                                                    Credit: Roohiya Dudukela Setup To get started off, we have to install webpack and webpack-cli. npm install webpack -g npm install webpack-cli -g We also need a basic project to bundle up. For that we can create a new folder called, ‘react-webpack’. mkdir react-webpack We need a package.json file in our project, that would later be required to define our start scripts. Change directory into the ‘my-app’ folder and run npm init. cd react-webpack npm init Create a index.js file that will serve as the entry point to the Webpack. touch index.js In the index.js, we can add a simple console.log. console.log('Hello World') We can now use the webpack-cli to bundle this index.js into dist/bundle.js. webpack index.js dist/bundle.js And the minified code, in bundle.js, would look something like this: ...(function(module, exports) {\n\neval(\"console.log('Hello Worl... *Update In the latest webpack-cli version, Simply calling ‘webpack index.js’ will auto-generate a distribution folder and bundled file called ‘main.js’   The project structure should look like this at this point: But as the code base grows, it is impossible to keep using the ‘webpack-cli’ to bundle the code, for every change we make. So what Webpack allows us to do, is to feed it a configuration object that consists of 5 important features. Entry / Output Dev Server Loaders Plugins Mode Entry / Output First up, we need to create a file for the configuration. touch webpack.config.js In that file, ‘webpack’ library is required, along with declaration of the config object and finally to export this config object that will be fed to Webpack. Next, you guessed it right, we have to populate the configuration object. Context As a good practice, all the source files should be in a separate folder and not in the root folder. Specifying context tells Webpack to look for all source files, starting from ‘index.js’, hereon. This eliminates the need to add dots and slashes, using the relative path, to get to a specific file. To set this up, we require the ‘path’ library to help map out the absolute paths. npm install --save-dev path Entry Since we have set up the context or the base directory, the entry file can be stated as just ‘index.js’ instead of ‘./index.js’. This is where Webpack starts to chart its dependency graph. Output The output file is what the Webpack churns out at the end that contains the compact bundle of the whole application with the dependencies intact. Resolve This attribute is to tell Webpack which files and folders to look into when building its dependency graph. Babel Before we continue to the next step, which involves writing fancy javascript codes, some Babel configuration is needed. Not all browsers are able to handle ES6 syntax. Therefore, we need to bridge the gap by adding a ‘babel-polyfill’ in the entry attribute. For the setup, babeljs.io has pretty neat steps and explanations. Babel-polyfill creates new functions on the fly for browsers that do not have the support for JS code that is above ES5. You can take a look at the browser compatibility table to check if a babel-polyfill is required for the project. This is not the complete configuration for the Babel, we will be adding babel-loader, later on, to provide a complete support for ES6 and above. The following, along with babel-polyfill, need to be installed for React projects. npm install --save babel-polyfill npm install --save-dev babel-preset-env npm install --save-dev babel-preset-react npm install --save-dev babel-preset-stage-0 For a setup of Babel, add a .babelrc in the folder. And add this piece of code: What this specifies is the Babel presets. A preset is a set of plugins that supports particular language features. For example, ‘react’ preset adds support for JSX and ‘es2015’ for ES6 features. To use Javascript features beyond ES7, these features can exist in various ‘stages’. The stages are defined as TC39 categories. So, to enable these features we use ‘stage-0’ as an idea phase or proposal but this helps us to write neat code. { "presets": [ "es2015", "react", "stage-0" ] } The setup up till this moment, should look something like this: In 4.0 No entry and output point need to defined explicitly: it will take the index.js in the src folder as the default. (Though you can still override the default by stating the path in the script in package.json.) Dev Server Setup This goes without saying, we need a development server to launch our application on the browser. Webpack provides us with its own dev server, webpack-dev-server. Installing webpack and webpack-cli to be saved within this project. npm install webpack-dev-server --save-dev npm install webpack --save-dev npm install webpack-cli --save-dev All that needs to be done, is to add a devServer attribute to the above configuration. With this, we will have a simple dev server running that is serving files from the current directory. Run server Now to be able to run the server, we need to tap in the package.json of the application. In the “scripts” attribute of the object, we add a command to run webpack-dev-server with hot attribute enabled. The hot attribute, allows the server to watch for changes in code. If there are any changes done, it will refresh the page automatically. 'scripts': { 'start': 'webpack-dev-server --hot' } To run this, we need to call the command: npm start And voila! We can see the application running on localhost:3000. In 4.0 This is as per Webpack 4.0 specifications as well. For the initial setup, you may look at this Github Repo for guidance. Up till this point, we have only scraped the surface of the Webpack. There’s still much to be explored. We are just getting to the exciting parts. The magic of Webpack lies in the ability to use loaders and plugins, which will be covered in Part II of this series. References 4 Key Concepts of Webpack | Netlify Webpack is JavaScript module bundler that has taken the world by storm, but a lack of great docs and wealth of…www.netlify.com Extract Text Plugin In the last lesson, we got our styles working all good; getting the css and scss files bundled and then getting the…medium.com Plugins Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org webpack-contrib/file-loader file-loader – A file loader for webpackgithub.com DevServer Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org Webpack 4 Tutorial: from 0 Conf to Production Mode (Updated) webpack 4 is out! The popular module bundler gets a massive update. webpack 4, what’s new? A massive performance…www.valentinog.com

Dissecting Webpack: ...

Many front-end developers shy away from Webpack, much less tackle it head-on. The nature of Webpack is as such, that it has too many configuration options and tinkering with it could break the application. So this article attempts to simplify major concepts in the building of a frontend project specifically with React. Sprinkled in the article are little tips to prepare for Webpack 4.0! What is Webpack? Overview A frontend application has multiple files, .js, .jsx, .png, .jpg, .css, .ttf. We cannot possibly take these bunch of files and dump it in the production web server. The performance load time and overheads will be ghastly. What we need is a single bundled file of js that holds the structure and logic of the components and application, a single file of CSS for styles, and HTML file to render the DOM and an assets folders for images and fonts. There have been many tools and task runners that have helped in the bundling process. But Webpack has emerged to be the one-stop solution for many React frontend developers.                                                                                                                                    Credit: Roohiya Dudukela Setup To get started off, we have to install webpack and webpack-cli. npm install webpack -g npm install webpack-cli -g We also need a basic project to bundle up. For that we can create a new folder called, ‘react-webpack’. mkdir react-webpack We need a package.json file in our project, that would later be required to define our start scripts. Change directory into the ‘my-app’ folder and run npm init. cd react-webpack npm init Create a index.js file that will serve as the entry point to the Webpack. touch index.js In the index.js, we can add a simple console.log. console.log('Hello World') We can now use the webpack-cli to bundle this index.js into dist/bundle.js. webpack index.js dist/bundle.js And the minified code, in bundle.js, would look something like this: ...(function(module, exports) {\n\neval(\"console.log('Hello Worl... *Update In the latest webpack-cli version, Simply calling ‘webpack index.js’ will auto-generate a distribution folder and bundled file called ‘main.js’   The project structure should look like this at this point: But as the code base grows, it is impossible to keep using the ‘webpack-cli’ to bundle the code, for every change we make. So what Webpack allows us to do, is to feed it a configuration object that consists of 5 important features. Entry / Output Dev Server Loaders Plugins Mode Entry / Output First up, we need to create a file for the configuration. touch webpack.config.js In that file, ‘webpack’ library is required, along with declaration of the config object and finally to export this config object that will be fed to Webpack. Next, you guessed it right, we have to populate the configuration object. Context As a good practice, all the source files should be in a separate folder and not in the root folder. Specifying context tells Webpack to look for all source files, starting from ‘index.js’, hereon. This eliminates the need to add dots and slashes, using the relative path, to get to a specific file. To set this up, we require the ‘path’ library to help map out the absolute paths. npm install --save-dev path Entry Since we have set up the context or the base directory, the entry file can be stated as just ‘index.js’ instead of ‘./index.js’. This is where Webpack starts to chart its dependency graph. Output The output file is what the Webpack churns out at the end that contains the compact bundle of the whole application with the dependencies intact. Resolve This attribute is to tell Webpack which files and folders to look into when building its dependency graph. Babel Before we continue to the next step, which involves writing fancy javascript codes, some Babel configuration is needed. Not all browsers are able to handle ES6 syntax. Therefore, we need to bridge the gap by adding a ‘babel-polyfill’ in the entry attribute. For the setup, babeljs.io has pretty neat steps and explanations. Babel-polyfill creates new functions on the fly for browsers that do not have the support for JS code that is above ES5. You can take a look at the browser compatibility table to check if a babel-polyfill is required for the project. This is not the complete configuration for the Babel, we will be adding babel-loader, later on, to provide a complete support for ES6 and above. The following, along with babel-polyfill, need to be installed for React projects. npm install --save babel-polyfill npm install --save-dev babel-preset-env npm install --save-dev babel-preset-react npm install --save-dev babel-preset-stage-0 For a setup of Babel, add a .babelrc in the folder. And add this piece of code: What this specifies is the Babel presets. A preset is a set of plugins that supports particular language features. For example, ‘react’ preset adds support for JSX and ‘es2015’ for ES6 features. To use Javascript features beyond ES7, these features can exist in various ‘stages’. The stages are defined as TC39 categories. So, to enable these features we use ‘stage-0’ as an idea phase or proposal but this helps us to write neat code. { "presets": [ "es2015", "react", "stage-0" ] } The setup up till this moment, should look something like this: In 4.0 No entry and output point need to defined explicitly: it will take the index.js in the src folder as the default. (Though you can still override the default by stating the path in the script in package.json.) Dev Server Setup This goes without saying, we need a development server to launch our application on the browser. Webpack provides us with its own dev server, webpack-dev-server. Installing webpack and webpack-cli to be saved within this project. npm install webpack-dev-server --save-dev npm install webpack --save-dev npm install webpack-cli --save-dev All that needs to be done, is to add a devServer attribute to the above configuration. With this, we will have a simple dev server running that is serving files from the current directory. Run server Now to be able to run the server, we need to tap in the package.json of the application. In the “scripts” attribute of the object, we add a command to run webpack-dev-server with hot attribute enabled. The hot attribute, allows the server to watch for changes in code. If there are any changes done, it will refresh the page automatically. 'scripts': { 'start': 'webpack-dev-server --hot' } To run this, we need to call the command: npm start And voila! We can see the application running on localhost:3000. In 4.0 This is as per Webpack 4.0 specifications as well. For the initial setup, you may look at this Github Repo for guidance. Up till this point, we have only scraped the surface of the Webpack. There’s still much to be explored. We are just getting to the exciting parts. The magic of Webpack lies in the ability to use loaders and plugins, which will be covered in Part II of this series. References 4 Key Concepts of Webpack | Netlify Webpack is JavaScript module bundler that has taken the world by storm, but a lack of great docs and wealth of…www.netlify.com Extract Text Plugin In the last lesson, we got our styles working all good; getting the css and scss files bundled and then getting the…medium.com Plugins Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org webpack-contrib/file-loader file-loader – A file loader for webpackgithub.com DevServer Installation Getting Started Asset Management Output Management Development Hot Module Replacement Tree Shaking…webpack.js.org Webpack 4 Tutorial: from 0 Conf to Production Mode (Updated) webpack 4 is out! The popular module bundler gets a massive update. webpack 4, what’s new? A massive performance…www.valentinog.com