Recently I’ve studied many tutorials and video courses about Python. I also did an amount of practical exercises with real world projects in Python, on my Linux Fedora notebook and even in CPython on Android.
Here, in this post, the focus is on Pandas, since in this moment we are all concerned with stats, extractions from either Excel files or any big data (hype). By the way I will add here a couple of mentions of scientific papers, that mathematically and seriously explain the context and the limits of R(0), R(t), the reproduction number.
Now, look, in terms of quality of Excel data extraction, nothing is better than C# or Python openpyxl. For example you can easily and accurately get a DateTime cast from multiple ClosedXml cells composing together their values. Or in Python openpyxl
>>> worksheet = workbook.worksheets >>> type(worksheet) <class 'openpyxl.worksheet.worksheet.Worksheet'> >>> worksheet.cell(2,24).value datetime.datetime(2020, 4, 1, 0, 0) >>> worksheet.cell(2,25).value datetime.time(12, 12, 3)
Python Pandas makes sense only for big data with standard, well defined and simple input format. F# for such standard input format as well and for pure, abstract, logical aggregation, especially inside .NET web platforms. C# or Python openpyxl are the most powerful for any disparate format, typical of business use cases. Anyway, notice that the performance might be an issue for the latter
Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file.
So C# appears to be the best, eventually
Finally, a note about two interesting, worth reading, scientific papers:
Notes On R0, James Holland Jones, Department of Anthropological Sciences, Stanford University, from which I mention:
his result has a nice geometric interpretation. The ESS virulence occurs where a line rooted at the origin is tangent to the curve that relates β to δ. This result is known as the Marginal Value Theorem and has applications in economics and ecology as well as epidemiology.
Theory versus Data: How to Calculate R0?
Romulus Breban, Raffaele Vardavas, Sally Blower – from this paper another very important and instructive lessons for our time:
Another approach (which is more commonly used) is to obtain R0 from population-level data, namely cumulative incidence data… It is very important to note that the individual-level modeling assumptions cannot be verified using population-level data (i.e., they remain hypothetical). ODE models are formulated in terms of disease transmissibility and progression rates at the population level. These parameters are obtained by fitting the model to population-level data; their relation to the individual-level processes may be quite complex and is generally unknown… Therefore, population-level predictions based upon an ODE model that use the R0 value found by contact tracing as a threshold parameter may be inaccurate.Our novel results have significant implications for understanding the dynamics of outbreaks of infectious diseases, particularly for the biological understanding of the transmission dynamics of the pathogen, estimating the severity of outbreaks, making health policy decisions, and designing epidemic control strategies.