Sunday, February 28, 2010

Mean Reversion



Tech rebound, the news say. Vast armies of savvy investors ready to pitch their dollar for the signs of hope for the next big one.

Reflecting on the 2009 djia. No wonder Goldman had such a spectacular year - the big rebound started in at Mar 06. From 6626.94 rock bottom to 10k dow in mid-October. It could have been a bad year only for illiquid ones. But, let's see what happens now when the bear had arrived. Funny businessweek article. - breed 'em then buy 'em is a great model, but what about moral hazard implications ?

Fluctuating economy. No rock to stand on. The next best things is finding right angle confluence of laminar flows.

The Risiko board game. Growing cultures behind closet doors, the corporate glass gardens. Microcosms of it's own. Additivity taxonomies for base set of random variable statistics. In search of a reference. Covariance - raison d'ĂȘtre of portfolio theory.

Scattered particles filling the void. An empty promise. A glimpse of hope. At least the Liar's Poker is over.

Friday, February 26, 2010

Continuum Beanbag Revisited



Alternative risk measures. Bond duration limits. One size doesn't fit all. Apples and Oranges Lemma.

Goodbye Pork Pie Hat. Computational complexity of financial products. Intractability of portfolio-product decomposition. Reverse the efficient set. Duration-based bond selection. International Finance discussions. Monte Carlo methods for bond default analysis.

More efficient sets. Jensen-Renyi Divergence. It never entered my mind. Big questions. Markowitz's mean-variance analysis. Central Park West. Not a good day for staying focused. Irrational promises of seemingly endless weekends.

Subadditivity of asset variances. Reflections of George Polya and student days on vintage wood beneath the green lights. Cheap Russian translations of academia masterpieces. MIT WWII-supply Electromagnetic textbooks on war-reserve recycled paper. Obsolete vacuum structure theories. Maxwellian reality.

Deadline Superhighway



Thought monologue. Finding correlations in data when there aren't any. It's harder when you even can't do that. Heavily biased data sets.

Static portfolio allocation with continuous changes in asset covariance ? Evolution of indifferent-to-distribution investor idea. Pursuit for "real" risk measure. Unfortunately it's probably investor-specific. Economic theory is not happy with such things. Socially responsible investing, anyone ? Required minimum levels ?

Simple Hadoop reporting framework for the masses. Cutting's "big spreadsheet" idea. Continuum bean bag.

More axis fun. Eternal compatibility struggle. Revenge of WS-Addressing. Mapping return codes to wsdl.

Big data. Seeking rho. Still in need of convenient ad hoc analysis solution. R as eternal de facto. Getting used to 0.20 API Context Objects.

Thursday, February 25, 2010

There it goes again



The Smiths on the radio. Grey morning heading for the afternoon. Messaging ways. Enterprise integration patterns.

Collisions in java logging forest. Interoperability, people ? Better projects do help development, but in the long run they might create unfair competition. Software projects economy. Jetty centralized slf4j logging colliding with qpid invoking slf4j directly. Collapse occurs. Loggers killed the application embedding. Quick hack - replacing slf4j-log4j with slf4j-simple. That's that. There are more interesting problems awaiting.

Dealing with confusions of async message delivery. Queue-driven runtime. Indexing dynamic data sources ? Controlled environment for thread handler execution. Long lost microseconds on the blocked threads highway. Visions of the times from long before. Yet, they are still waiting.

Finishing the pipeline. How do we address testability ? XML in the fast lane. Loosing a bracket here and there. Won't work, we do care. Marshalling sql updates. Full cycle. Discussing the world and the limits of expectations. Desire to belong to imagined perfect world just to realize that the real one don't seem worth dreaming. Still, start scratching the surface and bricks occur. The decaying plaster.

Following Hadoop steps. Giant footprints in data mud.

Wednesday, February 24, 2010

Implied Latency



Last minute evasive maneuvers. Simple return types. Builder pattern in partial object creation. The right tool for the job. Walking back down the UML boulevard. Small joys with yUML. Visual Paradigm UML seems like a serious OS/X option. Indent wars.

Picking battles. Sometimes a bit abstraction with no particular reason can be good. An unexpected offer. Quite pleasing. Alternative take on Builder pattern.

Betrayed by IntelliJ IDEA. Never undo refactoring if the original sources weren't committed! Swapping objects. Finding suitable home for classes used among packages. Dependency clash. Safety when converting to local classes, even if it took manual bean coding time. Spring cold kicking in. XML serialization for queue messages.

Back to qpid/jms. Vintage soundtracks. Dreams from far away. Logging too verbose. Difficulties with servlet JNDI. AMQNoRouteException. Back to queue troubleshooting. Yet, for another day.

Tuesday, February 23, 2010

Quiet Slipstream



Infrastructure updates. Troubled by AXIS2-3017. Street murmur. Having trouble stating the obvious. Portfolio selection vs futures position for profit lockdown ? Discussing distributed filesystem solutions for simplifying the design of replicated-data-store. Additional constraint of high availability and only tolerance for partial data access failures. Plus requirements to allow storage of large number of small files. Tricky. Lustre seems like a promising candidate. Time waits for no man.

Turing completeness of queue-based computation. Becoming regular reader. Message to Object demultiplexing. Sill working on compatibility issues. JAXB to the rescue. POJO-XML serialization model has dramatic limitations. Finally ended up with doing the thing semi manually. At least it's over.

Friendly talks with drunk walkers at Knez Mihajlova street.

Monday, February 22, 2010

Midtown Music




Spring preview in the streets. Pure bliss. Increasing entropy of the coffee shop. The eternal pursuit for the front view. Small things as contemporaries in the substance chronicles. Illustrations of the Curse. Diminishing distances with dimensionality increase. Combinatorial explosion of model-space. Increasing multicolinearity probability.

Fitting local structure of function. Simple mahout dimension-decomposition job. Locally-low dimension of function. 1-cube in pow(R,10) - seems Gaussian to a naked eye. General framework for estimating default risk by variable selection from media sources. Integrate mahout lib functions to voidbase for realtime/batch infrastructure.

Mixing queueing and state maintenance. Direct POST to soap service. Assuming all message conventions are satisfied, it should work right away. Deserialization will cause issues. All you need is a properly formatted soap message, raw post, and there you go. Stay away from complex xml types.

Fahey's strings. Deciphering fragments of autogenerated code. Axis2 Java Bean camelCase conventions on xml elements. No interoperability with legacy .net apps. Much Ado About Nothing.

Sunday, February 21, 2010

Sidewalk daydreaming



Macaulay's bond duration factor. Times spread the unemployment fear. Summoning the Depression. The Great one. Because bleakly zeitgeist is better then searching for truth.

To bonds, my friends! Critics of interest rate sensitivity metrics. Getting unbiased sample from heavily biased data. Using derived measures which are not obtained via one-to-one mapping can result in data loss. Using such values for comparison of two values is only applicable if the transformations are linear. Equivalent measures. Looking at yield curves, historical bond prices.

Token filters. Having to recreate index with each change promotes predicting future at design-time. Feed mixing troubles. The revenge of monolithic renderer. Make sure to force modularity next time, even for trivial cases.

Blue in Green. The essential Miles at midnight. Continuous compounding requires reinvesting in the same or equivalent bond. However, prices change with time. Flamenco Sketches. Sound transforming the void. Spot rates - quite a clever one. No arbitrage argument in determining forwards. Assessing the bond default risk. Companies too complex to be evaluated by Moody's. Reducing everything to cash flow analysis does prevent creative business. Evading the trap by not issuing bonds. But are there enough individual "creative" investors that can provide necessary funds ?

Saturday, February 20, 2010

Ailing Machinery




A week of blogging. Ingredients of basic distribution-analysis with Mahout ? Hadop debugging via simple log service. Missing New York. Reflecting on current state of information security. Jetty-based hadoop log service in 10 minutes. Maven made simple app development as a brick-packing. Trivial codebase mixing with direct pushes to local repo. Unfortunately, adding dependencies to complex maven hierarchy can be challenging. Used to Maven copy-dependencies to lib/ , totally forgot about Hadoop ways and put-it-all-in-task jar paradigm. Oh, well. We're good now.

Betrayed by the data. Creating algorithms that promise data-magic can be quite ungrateful at the payday. Don't commit to results. Get outlook estimates fast. Always keep the pessimistic edge. Checking out multipart MIME in support of http streaming work.

Iterating on Mahout design. Cabernet taste. Simple distribution statistics task in progress. Moving to 2-space indent. Going for hadoop-free mahout tasks. Interesting set of *Job interfaces, not sharing anything but runJob() convention. The chaos of dfs files management. Can't run two processes in parallel. Summoning trouble. DFS might allow concurrent access, but code usually doesn't. Visualizing general tree-like code architecture.

Mahout-157 Frequent Pattern Mining using Parallel FP-Growth. Interesting. Machine Learning in Computational Finance, Victor Boyarshinov PhD thesis. Interesting overview of optimal trading strategies training and optimal separation learning. However, in practice, all of the useful topics actually fall under "Computational Statistics" umbrella.

Time. Ticking away.

Friday, February 19, 2010

Dimensionality redux



Costa sunshine. Finally, a reson to go outside. Ultrahigh-dimensional variable selection as extreme case of Curse of Dimensionality. Machine Learning for common people. Haystack of simple algorithms easily embeddable to every web page. Scalable machine learning api ! Too much learning should skew the data. Approaching computational bounds and inseparability. Freedom. Get ready for Mahout weekend.

The mean integrated squared error of fits increases faster than linearly in p. That's the curse for you. Yellow buses marchning down the boulevard.

Fantastic! Efficient frontier is actually derived by constrained maximization of the objective (risk/return) function, under unit sum of asset ratios constraint. Lagrangian multiplier derivation in no time.

Production code releases. Is there a worse thing ? Boredom mixed with fear from failure. Ok, occasionally there is a twofold pleasure - both because something is accomplished and that something is finally over. However, weekly releases prevent pleasure from lasting long. Damn Agile. Ok, they might make the post-release pressure smaller because anything that's broken can be fixed next week, not year. Ah, the tradeoffs !

Reflecting on variability of Java/NIO performance across various linux environments. No free lunch, apparently. Linux distributed storage solutions. Benchmarking on IOPS. Mahout ways. Getting the simple dev environment up. Nice maven hierarchy. Pollution. Politics and dancing.

Thursday, February 18, 2010

The accidental glitter



Portfolio diversification, now ! No arbitrage in securities markets. The value of put+call portfolio on the same security should be less then or equal zero. The Living Life. Can't beat them all. Back to small features. Code is often grateful.

All day no keyboard. Sort of.

Dealing with one-offs. Hate to do it. Thinking about correlation coefficient. Working code | reading about portfolio analysis. Risk-increasing portfolios ? Can we really construct arbitrary risk functions ? Risk beyond the classical MPT "variance of normal distribution" definition. Risk function as expected value of loss function given distribution estimator and appropriate parameter. Still trying to figure out the analytical derivation of efficient frontier.

Tired. Sleepy. Gone.

Wednesday, February 17, 2010

Infrastructure Bliss



Enterprise+Adhoc service multiplexer bridge. Near-similar data mapping. Quick Levenshtein/Hamming distance string matching on large datasets. O(N^2) comparison operations. Infrastructure work. Abstract the n-tier multiplexer objects. Look at Apache Camel for ideas. Soap return types, mapping enums both ways. Please. Work!. Mapping POJO enums - not really out of the box. Find clues. Ambiguous object/xml serialization - nontrivial to resolve. Mybad - wsdl is not about syntax - only semantics => enum values should not be part of that description. Casting an eye back to the curse of dimensionality. Maven troubles.

SQL update exec wrapping in general "message update" framework + the bus for dispatching it all. Command message dispatched bus, with command message being abstract. Return message ? Yup - abstract + handle that as well. Distributed command objects. ADB Soap client quickspawn. Thinking about the notion of "information maximization" in the context of visual display on bounded surface.

The "door-mat" organizational pattern. It exists. It works.

ADB soap client generation is the way to go. Calling WSDL2Java from comamnd line is a drag - use org.apache.axis.wsdl.WSDL2Java instead. Hate calling main() but that's life. 7 more hours to go. Stub is there but still need to move / no tree offset. I can live with that. Neighborhood component analysis talks at #machinelearning. Cover trees seem interesting. That's that for you. Ok, how do I use service stubs ? 11K lines of generated code. Sweet. Now what ?

QName = (namespace, localpart (method)). Maven/Ivy wars at #hbase. Building xml as pile of object is not the most convenient when entire content is changing across all fields. However, single-field changes do leave the space for convenient architectural solutions. The good part is that memory is actually relatively cheap in this case, though it doesn't really seem like it. Pile of access methods, but the code generation enables to keep the data simple. Protocol verified. Now figure out axis2 document root redirection. Stereolab in the headphones. Pure bliss. Even enterprise POJOs don't seem so boring. Ok, perhaps they do.
And they don't really work all the way that easy.

Not happy. Not happy at all. Framework/Enterprise/Unknown unknowns - based development makes me sad.

http://issues.apache.org/jira/browse/AXIS2-4128 <- tough luck, not generating enumerations properly. Well, at least it's not me.

Tuesday, February 16, 2010

More italian taste



Noon@costa. Is algorithmic trading, by definition bounded to high-frequency domain ? Treasury bills as riskless securities forming baseline for riskless rate of interest. How about if we remove this assumption and account for risk of default, which, to make things worse, cannot be estimated from historical data. Fluctuating finance. "Give me the rock to stand on". What is the rock? Portfolio analysis is based on sampling the distribution of underlying assets, but though we're estimating the distribution, there are always unknowns. Well diversified portfolio should be able to handle even the unknowns, but the system risk is what spoils the party. A lot of effort is spent in eliminating this risk, but still no cigar. What if we simpy need to acknowledge system risk as fact of life rather than hoping we'll mitigate it ? Can modern finance still work ?

One of alternative approaches would simply be the "best effort" finance, where we simply discount systematic risk. For example, the notion of "covariance" is dramatically skewed by systematic risk. Discount it.

Data generalization issues. Is simple json descriptive enough ? Troubles with maintaining piles of xml configs. Meeting expectations while creating buffer - an eternal challenge. Wrap soap objects in jms serialization - automatic for the people ? Migrating old production code to new release is such a drag.

Save the Platanus in King Alexander Boulevard. Endless dreamy days in front of the University Library.

Monday, February 15, 2010

Delayed Awakening



QPid jar chaos. The underlying of JNDI implementation - is it jvm-wide ? Clearly having single data/object lookup is a convenient way for managing objects within jar conglomerates. Ideally - all objects on the same jvm would have access to the same lookup.However, the regular question s popup - locking / deadlock conditions / performance / .. JNDI is used by JDBC/JMS/EJB. With JMS we use JNDI to register JMS administered objects.

commons-collections framework / Bag interface & others. An essential toolchain entry. Maven neverending check from updates on timeouting apache repo. Wasted time. QPid test app finally working on a minimum jar-bag. Benchmarking qpid stability/performance with random queue / entry generation. The real danger are the unhandled exceptions. Stupid maven pom update. Dynamic nature of JMS env can also be a challenge for stability. Finally bored of apache repo stalls, adding -o flag to maven builds. Offline it is.

Back to QPid world. Figureout the addressing, user/conf etc. When connecting to qpid server - the following is needed : (user, pass, clientid, host, pass, message_queue, routing_key, ?).

Thinking about efficient way to organize all of the personal info sources : (quantnet, willmot forums, ny times, google reader stuff, irc chats, dzone) ; rss is a part of the solution - integrated to google reader, however - no solution for non-rss sources + google reader does not reflect the fudamental way I would like to search these sources, no "fairness" mode - at least not clearly visible, more clear UI is needed, a different way to represent sources visually) - need integrated bookmarks + events somehow. Reader home page is ok, but does not solve anything, though it is a model. Extension of graph-placing-like problems to general visual representation (via constraint minimization). Relating that to portfolio optimzation (abstractly) ?

QPid topologies : p2p (client creates named queue , published publishes to queue with key mapping to queue name, client consumes) one-to-many (bind/publish to 'fanout' exchange), pub-sub (consume streams matching a pattern ?), fast-reliable-messaging, transactional, transient (non-durable messages), durable (with header defining durability), federation (link brokers with qpid-route and then create static/dynamic routes between brokers, resulting in graph-like structure). Going with p2p/named queues for starters. Very cool is a perftest provided with the qpid. (for java use QpidBench). Interested in message-size-dependency of throughput/latency. Don't forget to setup persistence store / benchmark performance vs durability etc.

Idea - rss reader with ranking of the feeds. Interaction + learning => optimal information dashboard.

qpid / gues/gues is a valid default user/pass, which leaves us only with determining connection/routing stuff.

issues with lifetime/scope of jndi Context object. " A Context instance is not guaranteed to be synchronized against concurrent access by multiple threads. " - interesting. Issues with Context as global variable - quazi-solution by creating it locally.

install artifacts locally, awesome! :

mvn install:install-file -Dfile=lib/qpid-all.jar -DgroupId=org.apache.qpid -DartifactId=qpid-all -Dversion=0.5 -DgeneratePom=true -Dpackaging=jar

The jms/qpid p2p queue up&running. Some adhoc benchmarks - 1000 messages / 2129 ms (including println); 100k messages - out of memory (client side clobber) ; 10k messages - slowdown on receive ; Simple pub/sub does not solve the 'dequeue' problem - we need to delete entries explicitly.

Checking Sun's JMS documentation, trying to figure simple enqueue/dequeue with persistency process. Definitely in the PTP domain. Revisiting fundamentals - ConnectionFactory creating Connection using Destination-described destination, via MessageProducer/MessageConsumer, created from current Session. ConnectionFactory and Destination are found via JNDI.
PTP : (QueueConnectionFactory, QueueConnection, Queue, QueueSession, QueueSender, QueueReceiver + (TemporaryQueue), QueueBrowser)

QueueBrowser as a example of scalable access to distributed resources. size() is an overkill, iterator wins. Not really working in this case, though.

Late night flipside - voidbase work. Integrate online news sources and let it roll. The call for the night - working token frequency for arbitrary source.

The pulse of the world : http://twitter.com/statuses/public_timeline.rss

Sunday, February 14, 2010

gray & white | soul delight



Database design / denormalization / large tables and triggers in minimizing query complexity. Reflecting on finance hype of 2007. The future seemed blindingly bright. Greece default reflecting on Wall Street. Financial Engineering as a means of concealing debt. Blame it on the accounting practices. The Goldman Monster.

Saturday, February 13, 2010

Saturday Sun




The dot/superdot revolution reflected on the lotus 1-2-3 CRT. Efficient portfolio via quadratic programming in pursuit of the efficient frontier. Alpha/Beta wars. Much needed change after a JMS day in search for redundancy with hardware constraints. Long journey on a Ghibli note with the thermal pillars. Contemplating joys of life with the innocent art.

Hooking unix mainframes to trading floor data, 30 years later and still staying strong. 30 years of real time correlation tracking. However, the notion of marked data has vastly expanded. Social context and behavioral tracking.

Dynamics of real time processing boundaries in the last 30 years. ?
get the moore vs data complexity graphs

extending algo trading techniques beyond the financial ... online games (how?), social networks (why?) , information security, intrusion detection/aversion, generalized "game" context - 1-1 - 1-N. online auctions ? bidding systems, multiuser-interaction systems

algo trading is about creating and exploiting arbitrage opportunities, while measuring and minimizing risk/reward.