Is Partition By more performant than Group By?

In the past I’ve tried to search over internet and read some tutorials and the message I’m getting (also from github comments on sql related project repositories, like this from linq2db) is that new, modern window functions are more efficient than old, traditional grouping.


By the way, regarding linq2db and similar patterns (sqlalchemy in python and so on): notice how it enables you to use the new powerful features and so abstraction doesn’t hide complexity, when digging into details is your job.

Traditional, plain old way:grouping

For example this could be the first part of the simplified query.

SELECT distinct
csdc.cargo_id, csh.delivery_id, csh.parcel_id, ab.deal_tracking_num,
parcel.price as parcel_price
FROM comm_schedule_header csh
INNER JOIN ab_tran ab ON  ab.ins_num=csh.ins_num
inner join (select deal_tracking_num, max(ins_num) as ins_num
from ab_tran group by  deal_tracking_num) gab
on  gab.ins_num=csh.ins_num

The effort to avoid the distinct and to optimize the query is not worth it (not even sure it would be much better than the original, eventually).

In the past I’ve tried to search over internet and read some tutorials and the message I’m getting (also from github comments on sql related project repositories, like this from linq2db) is that new, modern window functions are more efficient than old, traditional grouping.

New, modern and more performant approach.

The modernized, completed and reduced SQL, after deleting lines not relevant to the topic, is:

select cargo_id ,delivery_id , parcel_id , deal_tracking_num ,
(SELECT csdc.cargo_id, csh.delivery_id, csh.parcel_id, ab.deal_tracking_num,
parcel.price as parcel_price,
row_number() over(
    partition by ab.deal_tracking_num, csdc.cargo_id, csh.delivery_id, csh.parcel_id
    order by ab.ins_num desc) as ab_rn
FROM comm_schedule_header csh
INNER JOIN ab_tran ab ON  ab.ins_num=csh.ins_num
INNER JOIN comm_sched_delivery_cmotion csdc ON csh.delivery_id=csdc.delivery_id
left join (
    select parcel_id, ins.price, row_number()
           over(partition by fee.fee_id, fee.parcel_id order by fee.ins_num desc) as RN
    from INS_FEE_PARAM fee
    inner join ins_price ins on ins.ins_num = fee.ins_num
    and ins.param_seq_num=fee.param_seq_num and ins.pricing_source_id=fee.fee_seq_num
    and ins.pricing_vol>0 and ins.price>0
    where fee.fee_def_id=2
    ) parcel on parcel.RN=1 and parcel.parcel_id=csh.parcel_id
WHERE csh.bav_flag=1
AND csh.volume_type = 8
AND csh.last_update>=DATE '2020-03-01' -- example of parameter selection
AND csh.delivery_id0
ORDER BY  csdc.cargo_id, csh.delivery_id, csh.parcel_id, ab.deal_tracking_num) dash
where dash.ab_rn = 1;

When I time it, I see, for example, 3731 results in about less 10 seconds from Oracle SQL Developer version 18.1 as a client, for a certain date parameter selection (as written above, DATE '2020-03-01'), but I consider this timing not reliable, also because I have to click CTRL + END to fetch all rows (among other reasons, e.g the variance of statistics, possible caching effects, etc…).

My DB is Oracle Database 12c Enterprise Edition – 64 bit, but of course this topic applies much more broadly.

Scientific method, query plan comparison

A more scientific approach, as suggested by DBA experts, is comparing the query plans. Given the permissions I have, I compared the explain plans of the two queries and I roughly looked at the total cost of the select statement. The cost of the window function version, 12793, as per the html rendered here, is the minimum cost of the compared explain plans, so again this confirms that the sql in the question is, up till now, the best one, so the window functions appear more performant than just grouping.

This is the explain plan, exported as html.

C# vs Python vs F#… and the science of R0 in epidemiology

Recently I’ve studied many tutorials and video courses about Python. I also did an amount of practical exercises with real world projects in Python, on my Linux Fedora notebook and even in CPython on Android.

Here, in this post, the focus is on Pandas, since in this moment we are all concerned with stats, extractions from either Excel files or any big data (hype). By the way I will add here a couple of mentions of scientific papers, that mathematically and seriously explain the context and the limits of R(0), R(t), the reproduction number.

Now, look, in terms of quality of Excel data extraction, nothing is better than C# or Python openpyxl. For example you can easily and accurately get a DateTime cast from multiple ClosedXml cells composing together their values. Or in Python openpyxl

>>> worksheet = workbook.worksheets[0]
>>> type(worksheet)
<class 'openpyxl.worksheet.worksheet.Worksheet'>
>>> worksheet.cell(2,24).value
datetime.datetime(2020, 4, 1, 0, 0)
>>> worksheet.cell(2,25).value
datetime.time(12, 12, 3)

Screenshot from 2020-04-13 21-32-04

Python Pandas makes sense only for big data with standard, well defined and simple input format. F# for such standard input format as well and for pure, abstract, logical aggregation, especially inside .NET web platforms. C# or Python openpyxl are the most powerful for any disparate format, typical of business use cases. Anyway, notice that the performance might be an issue for the latter

Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file.

So C# appears to be the best, eventually

Finally, a note about two interesting, worth reading, scientific papers:

Notes On R0, James Holland Jones, Department of Anthropological Sciences, Stanford University, from which I mention:

his result has a nice geometric interpretation. The ESS virulence occurs where a line rooted at the origin is tangent to the curve that relates β to δ. This result is known as the Marginal Value Theorem and has applications in economics and ecology as well as epidemiology.

Theory versus Data: How to Calculate R0?
Romulus Breban, Raffaele Vardavas, Sally Blower – from this paper another very important and instructive lessons for our time:

Another approach (which is more commonly used) is to obtain R0 from population-level data, namely cumulative incidence data… It is very important to note that the individual-level modeling assumptions cannot be verified using population-level data (i.e., they remain hypothetical). ODE models are formulated in terms of disease transmissibility and progression rates at the population level. These parameters are obtained by fitting the model to population-level data; their relation to the individual-level processes may be quite complex and is generally unknown… Therefore, population-level predictions based upon an ODE model that use the R0 value found by contact tracing as a threshold parameter may be inaccurate.

Our novel results have significant implications for understanding the dynamics of outbreaks of infectious diseases, particularly for the biological understanding of the transmission dynamics of the pathogen, estimating the severity of outbreaks, making health policy decisions, and designing epidemic control strategies.




#Angular #Reactive Forms with @MaterialUI

This is extremely modern and cool.


The demo app is published on firebase.

Look at the final code: it is a couple of angular data binding from a reactive form control to an interpolation through a function. This is so similar to the Model-View-ViewModel in desktop apps. My favorite F# stack solution, WebSharper, has, of course, reactive views and vars (I don’t like Elmish approach instead).

Here the point is that I like Angular – instead of React – Material UI. So I’m forced to use TypeScript. See this thread for clarifications.

By the way, moment.js is the de facto standard for dates on the frontend.

I’ve shared the whole project here on stackblitz!

Mutable, Shadowing, Recursion

After some days of analysis and complex examples of programming, based on adventofcode 2019, I have derived some useful theorems about functional programming.

There are 3 concepts we need to understand: mutability, shadowing and recursion.


F# has different forms of variables: immutable and mutable (plus byref like C# and ref like OCaml).

First theorem of shadowing

You can always represent the changes of an immutable functional-style variable with shadowing.

Second theorem of mutability

Aside from while loop (and record type fields, if you need a single instruction for multiple assignment), you can always represent the changes of a mutable imperative-style variable with shadowing.


You have 2 options to transform an imperative-style while-loop with mutable variables to a functional-style code

  1. tail-recursion if it is doable, but this is not always possible in F# (as opposite to haskell) without reintroducing  stack explosion in some edge cases, because F# is a strict , eagerly evaluated FP while haskell is non-strict, lazy
  2. you can always Seq/List/Array .unfold and in this way you can represent the imperative loop as a functional state machine (see most of AoC2019 solutions on github for practical, albeit complex examples)



F# type providers are awesome

For example, consider the csv format, that is quite common in business interfaces.

Well, F# provider has all the options you may need to load it.

type PL = 
        Sample = sample,
        Separators = ";", SkipRows = 1, HasHeaders = false, 
        Schema ="CargoID(string),CurrencyType(string),Currency(string),Amount(decimal)">

Now, we borrow a function from Reed Copsey

let tryToInt s = 
    match System.Int32.TryParse s with
    | true, v -> Some v
    | false, _ -> None

and then we can invent a cool, original, functional way to choose only the lines with a parsable int in the field for cargo id. We are indeed unleashing the power of F# Type Providers when we pass the safe-typed row in the signature of the following.

let withCargoID (rows:PL.Row[]) =
    |> Array.choose(fun r -> 
        tryToInt r.CargoID
        |> Option.bind (fun _ -> Some r)

and the rest is an easy exercise of grouping

type CurrencyGroup = {CurrencyType: string; Currency: string}

let parse (filename:string) = 
    let pl = PL.Load filename
    let rows = pl.Rows |> Seq.toArray
    let header = rows |> Array.head
    printfn "rows #: %d with cargoes %d" rows.Length (withCargoID rows).Length
    printfn "first line => CargoID: %s, CurrencyType: %s, Currency: %s, Amount: %f!" header.CargoID header.CurrencyType header.Currency header.Amount
    let currGroups = 
        |> Array.groupBy( fun r ->
            {Currency = r.Currency; CurrencyType = r.CurrencyType}
    |> Array.iter( fun (g, r) ->
        printfn "Currency %s CurrencyType %s #: %d with cargoes: %d" g.Currency g.CurrencyType r.Length (withCargoID r).Length
    |> Array.sumBy(fun (g, r) -> r.Length)
    |> printfn "total records in all currencies %d" 
    |> Array.sumBy(fun (g, r) -> (withCargoID r).Length)
    |> printfn "total with cargoes in all currencies %d" 
    printfn "parsed fine!"

How to await an async callback

I’m trying to await a callback that is fired when a button is pressed.


The important point is that I want to wait for the callback from a simple await without reshaping the code, that could be a loop of batches which require a user confirmation at each step.

We can also add a cancellation option if the user closes the dialog.

BatchProgressVM progressVM = new BatchProgressVM();
BatchProgress batchProgress = new BatchProgress();
progressVM.ProgressValue = 0;
progressVM.Log = msg => this.Status = msg;
progressVM.ProgressMessage = "Click Next to continue";
batchProgress.DataContext = progressVM;
CancellationTokenSource cts = new CancellationTokenSource();
batchProgress.Closed += OnClose;
Dispatcher dispatcher = Dispatcher.CurrentDispatcher;
void OnClose(object sender, EventArgs e)

In other words I want to achieve the following:

// e.g. it can be used in a loop
internal async Task BatchLogic(int num, int tot, CancellationTokenSource cts, Dispatcher dispatcher)
    taskNum = num;
    totTask = tot;
    dispatcher.Invoke(() =>
           ProgressMessage = $"task {taskNum-1} of {totTask} done! Click Next!";
           IsBusy = false; // enable Next button for user to confirm going on
    cts.Token.Register(() =>
         if (callback != null) callback.Invoke(Task.FromResult(false));
    // e.g. await that the user confirms
    await Task.Factory.FromAsync(beginMethod, endMethod, null);
    ProgressValue = 100 * taskNum / totTask;

with the following defintions

// define a closure for the callback
AsyncCallback callback; 
//pass the callback to the closure from the beginMethod
private IAsyncResult beginMethod(AsyncCallback callback, object state)
    this.callback = callback;
    return Task.FromResult(true);

private void endMethod(IAsyncResult obj)
    Task taskRes = obj as Task;
    // check if it is completed or cancelled
    bool res = taskRes?.Result ?? true;
    if (res)
        ProgressMessage = $"task {taskNum} of {totTask} working! Pls wait";
        IsBusy = true; // disable Next button while working

//and finally invoke it from the event (i.e. the Command method in the MVVM for WPF)
private async void RunNext()
    if (callback != null)

As I said, the usage is very convenient inside a loop

await progressVM.BatchLogic(batch_num, tot_batch, cts, dispatcher);
if (cts.IsCancellationRequested) break;

In the end (e.g. after the async loop) you can show the results and unsuscribe the Close event.

Status = (cts.IsCancellationRequested ? "Cancelled!" : "Done!");
batchProgress.Closed -= OnClose;

Autogenerating columns: SQLProvider and WebSharper.

A real world example

This is a short follow up to my previous Modern CSS3 article
We want to see how to autogenerate the table columns from the DB to the Web in a few lines with type checks all the way through.

The DB

I’ve recently discovered the cool bang-bang operator for the left join in SQLProvider.

let context = Sql.GetDataContext(ConfigurationManager.ConnectionStrings.["AppDB"].ConnectionString)

let tradesQuery = 
    query {
        for t in context.OilPhysical.EndurTradeValid do  
        join n_del in (!!) context.OilPhysical.EndurNominationValid on
                 = (n_del.DeliveryDealNumber,n_del.DeliveryParcelGroup,n_del.DeliveryParcelId))
        join n_rec in (!!) context.OilPhysical.EndurNominationValid on
                 = (n_rec.ReceiptDealNumber,n_rec.ReceiptParcelGroup,n_rec.ReceiptParcelId))
        take 90
        select (t,n_del.CargoId, n_rec.CargoId)  

let trades = 
    tradesQuery |> Seq.toArray 
    |> (t,n_del,n_rec) -> 
        |> Seq.append ([("Del Cargo ID", n_del :> obj)] |> Seq.append [("Rec Cargo ID", n_rec :> obj)]  ))

The Web

While for desktop app you would need property bindings and plain old classes, the modern type provider types are a nice fit for the awesome WebSharper dsl.
The conclusion is that you can easily configure an agile web page with all the fancy features you need.

        if Array.isEmpty ServerModel.trades then
            p [] [text "No trades selected"]
            Doc.Concat [
            div [] [client <@ Client.Inspector() @>];
            table [attr.``class`` "table"; attr.``data-`` "role" "table"; attr.``data-`` "horizontal-scroll" "true"; "demo-table"] [
                yield thead [] [
                    tr [] [
                        for (k,v) in ServerModel.trades.[0] do
                            match v with
                            |  %3A? int
                            |  %3A? decimal -> yield th [attr.``class`` "sortable-column"; attr.``data-`` "format" "number"] [text k]
                            | _ ->  yield th [attr.``class`` "sortable-column"] [text k]
                    ] ]
                yield tbody [] [
                for tup in ServerModel.trades do
                    yield tr [] [
                        for (k,v) in tup do
                            yield td [] [text <| string v] ]